Business Problem
# Swire is looking to forecast commodity costs to help with business strategy. They desire to strike a balance between higher sales with a lower product price, and pricing their product high enough for greater margins. Forecasting costs will help them plan for pricing in the future. Understanding commodity costs will also help Swire optimize its procurement strategy and inventory management.
# The analysis will be successful if it accurately predicts price. High scores on the model accuracy for testing data will allow for confidence in the model. The project will be executed by our team. We are not only looking for an accurate model but a interactive dashboard that is easy for Swire to use. This would allow them to select individual elements of the prediction for customized observation. Any data found to help predictions could be added in the future.
#An important part of Swire's business is understanding commodity pricing. It impacts production, margins, and business planning. Creating an accurate time-series forecast of price will help with strategy. Future planning is key for any business and the forecast will help with planning with their resources.
Analytical
Objective
# The desired outcome would be a strong forecast model for commodity price. Some of the key metrics will be p-value, coefficients, AIC values and more. These metrics can be shared with stakeholders for them to understand the strength of the model as they use it. Multiple models will be created to view differences in prediction and metrics. After a analysis of the performance of each model, a deferment to the best model will be made. The ultimate object would be to build a prediction model that Swire can use to inform decisions for future planning.
Packages
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(ggplot2)
library(forecast)
library(astsa)
##
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
##
## gas
library(tseries)
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ✔ purrr 0.3.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::first() masks xts::first()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::last() masks xts::last()
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(bigrquery)
library(prophet)
## Loading required package: Rcpp
## Loading required package: rlang
##
## Attaching package: 'rlang'
##
## The following objects are masked from 'package:purrr':
##
## %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
## flatten_lgl, flatten_raw, invoke, splice
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(openxlsx)
library(dplyr)
library(readr)
library(feasts)
## Loading required package: fabletools
##
## Attaching package: 'fabletools'
##
## The following objects are masked from 'package:caret':
##
## MAE, RMSE
##
## The following objects are masked from 'package:forecast':
##
## accuracy, forecast
##
## Attaching package: 'tsibble'
##
## The following object is masked from 'package:lubridate':
##
## interval
##
## The following object is masked from 'package:zoo':
##
## index
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
##
## Attaching package: 'Metrics'
##
## The following object is masked from 'package:fabletools':
##
## accuracy
##
## The following object is masked from 'package:rlang':
##
## ll
##
## The following objects are masked from 'package:caret':
##
## precision, recall
##
## The following object is masked from 'package:forecast':
##
## accuracy
## Loading required package: parallel
##
## Attaching package: 'rugarch'
##
## The following object is masked from 'package:fabletools':
##
## report
##
## The following object is masked from 'package:purrr':
##
## reduce
##
## The following object is masked from 'package:stats':
##
## sigma
## Warning: package 'rmgarch' was built under R version 4.2.3
##
## Attaching package: 'rmgarch'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## The following objects are masked from 'package:xts':
##
## first, last
Data Load
coffee <- read.csv("Coffee_prices.csv", stringsAsFactors = FALSE)%>%
mutate(date = ymd(date))%>%
mutate(month = format_ISO8601(date, precision = "ym"))
Aluminum <- read.csv("Aluminium_end_price.csv",stringsAsFactors = FALSE)%>%
mutate(date = ymd(date))%>%
mutate(month = format_ISO8601(date, precision = "ym"))
sugar <- read.csv("sugar-prices-historical-chart-data.csv",stringsAsFactors = FALSE)%>%
mutate(date = mdy(date))%>%
mutate(month = format_ISO8601(date, precision = "ym"))
soybean <- read.csv("soybean-prices-historical-chart-data.csv",stringsAsFactors = FALSE)%>%
mutate(date = ymd(date))%>%
mutate(month = format_ISO8601(date, precision = "ym"))
soybeanoil <- read.csv("soybean-oil-prices-historical-chart-data.csv",stringsAsFactors = FALSE)%>%
mutate(date = ymd(date))%>%
mutate(month = format_ISO8601(date, precision = "ym"))
corn <- read.csv("corn-prices-historical-chart-data.csv")%>%
mutate(date = ymd(date))%>%
mutate(month = format_ISO8601(date, precision = "ym"))
cotton <- read.csv("cotton-prices-historical-chart-data.csv")%>%
mutate(date = ymd(date))%>%
mutate(month = format_ISO8601(date, precision = "ym"))
Exploring Sugar
# Looking at the price of sugar over the last 30 years
ggplot(sugar, aes(date, price)) +
geom_line() +
labs(title = "Sugar Prices") + xlab("year") + ylab("prices")

#Summary of sugar
summary(sugar)
## price date month
## Min. :0.0438 Min. :1993-01-04 Length:7581
## 1st Qu.:0.0971 1st Qu.:2000-07-31 Class :character
## Median :0.1214 Median :2008-03-12 Mode :character
## Mean :0.1340 Mean :2008-02-23
## 3rd Qu.:0.1672 3rd Qu.:2015-09-18
## Max. :0.3531 Max. :2023-02-17
#Significant fluctuations in price.
#Sugar monthly average
sugar_m <- sugar %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)
#Drop Duplicate rows Sugar
sugar_m <- sugar_m[!duplicated(sugar_m),]
#Creating Time series data for Sugar
sugar_ts <- ts(sugar_m$mon_avg,start=c(1993),frequency=12)
#Plotting
chartSeries(sugar_ts)

#Looking at trends
autoplot(decompose((sugar_ts)),main="")

#Price spikes occur between 2010-2012 and again in the 2020's
#Sugar Looking for daily or weekly trends
sugar_r <- sugar %>%
filter(date >= as.Date('2022-11-01') & date <= as.Date('2023-01-31'))
ggplot(sugar_r, aes(date, price)) +
geom_line() +
labs(title = "Sugar Prices Daily") + xlab("time") + ylab("prices")

# There does not appear to be trends by day of the week.
#Sugar Prices Controlling for inflation
#write.csv(sugar_m, file="sugar_m.csv",row.names = FALSE)
sugar_CPI <- read.csv("Sugar_cpi.csv")
sugar_adj <- left_join(sugar_m, sugar_CPI, by = c("month" = "date"))
sugar_adj$adj_price <- sugar_adj$mon_avg / sugar_adj$Value
sugar_ts_adj <- ts(sugar_adj$adj_price, start = c(1993), frequency = 12)
ggplot(sugar, aes(date, price)) +
geom_line() +
labs(title = "Sugar Prices") + xlab("year") + ylab("prices")


ggplot(data = sugar_adj, aes(x = month, y = adj_price, group = 1)) +
geom_line() +
labs(x = "Month", y = "Adjusted Sugar Price", title = "Sugar Prices Adjusted for Inflation")

ggplot(sugar_adj, aes(x = month)) +
geom_line(aes(y = Value, color = "Inflation Rate", group = 1)) +
geom_line(aes(y = mon_avg, color = "Sugar Price", group = 1)) +
scale_color_manual(values = c("blue", "red")) +
xlab("Month") +
ylab("Value") +
ggtitle("Inflation Rate and Sugar Prices over Time")

sugar_adj$norm_value <- scale(sugar_adj$Value)
sugar_adj$norm_mon_avg <- scale(sugar_adj$mon_avg)
# Calculate the adjusted sugar price by dividing the monthly average sugar price by the CPI value
# Plot the normalized values for 'Value' and 'mon_avg' and the adjusted sugar price as lines
#changing month to a date
sugar_adj$month <- as.Date(paste0(sugar_adj$month, "-01"))
ggplot(sugar_adj, aes(x = month)) +
geom_line(aes(y = norm_value, color = "CPI", group = 1)) +
geom_line(aes(y = norm_mon_avg, color = "Sugar Price", group = 1)) +
scale_color_manual(values = c("darkblue", "orange")) +
xlab("Date") +
ylab("Normalized Values") +
ggtitle("Inflation Rate vs. Price - Sugar") +
scale_x_date(date_labels = "%Y")

#Seasonal Plots
#Format back to date in aggregated month column
sugar_m %>%
mutate(month = ym(month)) -> Sugar_mon
#Convert to tibble
Sugar_mon <- as_tibble(Sugar_mon)
#convert data frame into time series tsibble object
Sugar_mon %>% as_tsibble(index = month) -> Sugar_mon_ts
#Format data for Sugar seasonal plots
Sugar_mon_ts %>%
mutate(Month = tsibble::yearmonth(month)) %>%
as_tsibble(index = Month) %>%
dplyr::select(Month,mon_avg) -> Sugar_sea_ts
#Sugar seasonal plot by month
autoplot(Sugar_sea_ts, mon_avg) +
ylab("monthly Sugar prices") +
xlab("")

#Different view Sugar seasonal plot
Sugar_sea_ts %>% gg_season(mon_avg, labels = "both") +
ylab("Monthly Sugar prices")

#Sugar seasonal subseries plot
Sugar_sea_ts %>% gg_subseries(mon_avg) +
labs(y = "Sugar prices", title = "Seasonal subseries plot: Sugar prices")

Arima Sugar
##Stationary Test
adf.test(sugar_ts, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: sugar_ts
## Dickey-Fuller = -2.3631, Lag order = 7, p-value = 0.4236
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(sugar_ts), alternative ="stationary")
## Warning in adf.test(diff(sugar_ts), alternative = "stationary"): p-value smaller
## than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(sugar_ts)
## Dickey-Fuller = -7.8345, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1
#Custom ARIMA Model
#Correlation Plot and Tuning selection. This plot looks at price correlation with itself in prior time periods. The parameter selection controls for that.
#ACF (q)
acf(diff(sugar_ts),main='')

#q=2
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(sugar_ts),main='')

#p=2. There are 2 partial auto corelation values
# ARIMA Custom
sugar_fit<- Arima(sugar_ts, order=c(2,1,2))
BIC(sugar_fit)
## [1] -2226.736
# Calculate the AICc value
loglik_sugar_fit <- logLik(sugar_fit)
loglik_sugar_fit <- logLik(sugar_fit) - max(loglik_sugar_fit)
n <- length(sugar_ts)
k <- length(coef(sugar_fit))
sugar_fit_AICc <- AIC(sugar_fit) + 2*k*(k+1)/(n-k-1) - 2*loglik_sugar_fit
sugar_fit_AICc
## 'log Lik.' -2246.069 (df=5)
## Series: sugar_ts
## ARIMA(2,1,2)
##
## Coefficients:
## ar1 ar2 ma1 ma2
## 0.9220 -0.0449 -0.5155 -0.3888
## s.e. 0.1624 0.1843 0.1571 0.1996
##
## sigma^2 = 0.0001142: log likelihood = 1128.09
## AIC=-2246.18 AICc=-2246.01 BIC=-2226.74
# ARIMA Alternate Custom
sugar_fit2<- Arima(sugar_ts, order=c(2,1,3))
sugar_fit2
## Series: sugar_ts
## ARIMA(2,1,3)
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3
## 0.0427 0.8469 0.3682 -0.9154 -0.4054
## s.e. 0.1680 0.1584 0.1714 0.0975 0.0836
##
## sigma^2 = 0.0001144: log likelihood = 1128.25
## AIC=-2244.51 AICc=-2244.27 BIC=-2221.17
forecast::accuracy(sugar_fit2)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0005395214 0.01060471 0.007528664 0.009326486 5.68556 0.2399881
## ACF1
## Training set -0.006954922
#Check residuals
checkresiduals(sugar_fit2)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,3)
## Q* = 29.583, df = 19, p-value = 0.05735
##
## Model df: 5. Total lags used: 24
#Auto-fit Arima
auto_sugar<- auto.arima(sugar_ts)
auto_sugar
## Series: sugar_ts
## ARIMA(0,1,1)(0,0,2)[12]
##
## Coefficients:
## ma1 sma1 sma2
## 0.4213 0.1814 0.0903
## s.e. 0.0488 0.0524 0.0537
##
## sigma^2 = 0.0001105: log likelihood = 1133.31
## AIC=-2258.62 AICc=-2258.51 BIC=-2243.06
##Forecast Plot
##Forecast Custom
autoplot(forecast::forecast(sugar_fit, h=12, level=c(80,95)))

##Forecast Custom 2
autoplot(forecast::forecast(sugar_fit2, h=12, level=c(80,95)))

##Forecast Auto
autoplot(forecast::forecast(auto_sugar, h=12, level=c(80,95)))

#ARIMA using more recent data
sugar_r2 <- sugar %>%
filter(date >= as.Date('2019-01-01') & date <= as.Date('2023-01-31'))
sugar_r2 <- sugar_r2 %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)
#Drop Duplicate rows Sugar
sugar_r2 <- sugar_r2[!duplicated(sugar_r2),]
sugar_tsr <- ts(sugar_r2$mon_avg,start=c(2019),frequency=12)
##Stationary Test
adf.test(sugar_tsr, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: sugar_tsr
## Dickey-Fuller = -2.4651, Lag order = 3, p-value = 0.388
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(sugar_tsr), alternative ="stationary")
##
## Augmented Dickey-Fuller Test
##
## data: diff(sugar_tsr)
## Dickey-Fuller = -3.5424, Lag order = 3, p-value = 0.04751
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=2
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(sugar_tsr),main='')

#q=0
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(sugar_tsr),main='')

#p=0. There are 2 partial auto corelation values
# ARIMA Custom
sugar_fitR<- Arima(sugar_tsr, order=c(0,2,0))
sugar_fitR
## Series: sugar_tsr
## ARIMA(0,2,0)
##
## sigma^2 = 0.0001337: log likelihood = 142.93
## AIC=-283.86 AICc=-283.77 BIC=-282.01
forecast::accuracy(sugar_fitR)
## ME RMSE MAE MPE MAPE MASE
## Training set -7.134492e-05 0.01132439 0.008501457 0.02670897 6.032758 0.3216336
## ACF1
## Training set -0.2904199
#Check residuals
checkresiduals(sugar_fitR)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,2,0)
## Q* = 23.23, df = 10, p-value = 0.00993
##
## Model df: 0. Total lags used: 10
#Auto-fit Arima
auto_sugarR<- auto.arima(sugar_tsr)
auto_sugarR
## Series: sugar_tsr
## ARIMA(0,1,1)(0,0,1)[12]
##
## Coefficients:
## ma1 sma1
## 0.3711 0.5497
## s.e. 0.1442 0.2408
##
## sigma^2 = 6.251e-05: log likelihood = 163.02
## AIC=-320.04 AICc=-319.49 BIC=-314.42
##Forecast Plot
autoplot(forecast::forecast(auto_sugarR, h=12, level=c(80,95)))

##Forecast Custom
autoplot(forecast::forecast(sugar_fitR, h=12, level=c(80,95)))

#Printing Predictions
sugar_predictions <- forecast::forecast(auto_sugarR,h=12)
print(sugar_predictions$mean)
## Jan Feb Mar Apr May Jun Jul
## 2023 0.1980707 0.2017292 0.2008702 0.1972369 0.1956776 0.1919627
## 2024 0.2038595
## Aug Sep Oct Nov Dec
## 2023 0.1876254 0.1872578 0.1898181 0.1962718 0.2008475
## 2024
Sugar GARCH
# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data
sugar_garch <- ugarchfit(spec = garch_model, data = sugar_ts)
sugar_vol <-ts(sugar_garch@fit$sigma^2,start=c(1993),frequency=12)
print(sugar_garch)
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.116391 0.001092 106.62385 0.000000
## omega 0.000051 0.000012 4.22398 0.000024
## alpha1 0.977824 0.094878 10.30608 0.000000
## beta1 0.021175 0.043050 0.49188 0.622804
## shape 99.999556 70.368010 1.42109 0.155289
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.116391 0.002521 46.16956 0.000000
## omega 0.000051 0.000015 3.34398 0.000826
## alpha1 0.977824 0.052135 18.75546 0.000000
## beta1 0.021175 0.038591 0.54871 0.583201
## shape 99.999556 10.484237 9.53809 0.000000
##
## LogLikelihood : 756.3023
##
## Information Criteria
## ------------------------------------
##
## Akaike -4.1508
## Bayes -4.0971
## Shibata -4.1512
## Hannan-Quinn -4.1295
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 248.9 0
## Lag[2*(p+q)+(p+q)-1][2] 345.9 0
## Lag[4*(p+q)+(p+q)-1][5] 597.8 0
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 4.891 0.02700
## Lag[2*(p+q)+(p+q)-1][5] 7.625 0.03648
## Lag[4*(p+q)+(p+q)-1][9] 8.266 0.11428
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.2124 0.500 2.000 0.6449
## ARCH Lag[5] 1.0171 1.440 1.667 0.7279
## ARCH Lag[7] 1.0949 2.315 1.543 0.8976
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 10.7911
## Individual Statistics:
## mu 4.1788
## omega 0.4136
## alpha1 0.2436
## beta1 0.6172
## shape 8.4986
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.4952 0.1357
## Negative Sign Bias 0.5157 0.6064
## Positive Sign Bias 0.6158 0.5384
## Joint Effect 2.7621 0.4298
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 325.8 9.816e-58
## 2 30 358.8 1.509e-58
## 3 40 356.7 6.333e-53
## 4 50 390.5 9.785e-55
##
##
## Elapsed time : 0.154815
#plot(sugar_garch, which = 1)
coef(sugar_garch)
## mu omega alpha1 beta1 shape
## 1.163906e-01 5.093451e-05 9.778237e-01 2.117538e-02 9.999956e+01
# Forecasting
horizon <- 3
sugar_forecast_garch <- ugarchforecast(sugar_garch, n.ahead = horizon)
forecast_mean_sugar <- as.numeric(sugar_forecast_garch@forecast$seriesFor)
actual_values_sugar <- as.numeric(window(sugar_vol, start = c(1993, 1)))
#plot(sugar_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single",
# main = "GARCH Forecast for Sugar Prices", ylab = "Price", xlab = "Time") #%>%
#lines(sugar_ts[(length(sugar_ts)-horizon+1):length(sugar_ts)], col = "blue")
#GARCH Take 2
#Use ARIMA values from acf, pacf. Did this in Sugar_fit
sugar_fit
## Series: sugar_ts
## ARIMA(2,1,2)
##
## Coefficients:
## ar1 ar2 ma1 ma2
## 0.9220 -0.0449 -0.5155 -0.3888
## s.e. 0.1624 0.1843 0.1571 0.1996
##
## sigma^2 = 0.0001142: log likelihood = 1128.09
## AIC=-2246.18 AICc=-2246.01 BIC=-2226.74
## Series: sugar_ts
## ARIMA(2,1,3)
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3
## 0.0427 0.8469 0.3682 -0.9154 -0.4054
## s.e. 0.1680 0.1584 0.1714 0.0975 0.0836
##
## sigma^2 = 0.0001144: log likelihood = 1128.25
## AIC=-2244.51 AICc=-2244.27 BIC=-2221.17
## Series: sugar_tsr
## ARIMA(0,2,0)
##
## sigma^2 = 0.0001337: log likelihood = 142.93
## AIC=-283.86 AICc=-283.77 BIC=-282.01
## Series: sugar_ts
## ARIMA(0,1,1)(0,0,2)[12]
##
## Coefficients:
## ma1 sma1 sma2
## 0.4213 0.1814 0.0903
## s.e. 0.0488 0.0524 0.0537
##
## sigma^2 = 0.0001105: log likelihood = 1133.31
## AIC=-2258.62 AICc=-2258.51 BIC=-2243.06
## Series: sugar_tsr
## ARIMA(0,1,1)(0,0,1)[12]
##
## Coefficients:
## ma1 sma1
## 0.3711 0.5497
## s.e. 0.1442 0.2408
##
## sigma^2 = 6.251e-05: log likelihood = 163.02
## AIC=-320.04 AICc=-319.49 BIC=-314.42
#Based on significance. Let's try auto_arimas
garch_model2 <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(1,0), include.mean = TRUE), distribution.model = "std")
sugar_garch2 <- ugarchfit(spec = garch_model, data = sugar_ts)
sugar_garch2
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.116391 0.001092 106.62385 0.000000
## omega 0.000051 0.000012 4.22398 0.000024
## alpha1 0.977824 0.094878 10.30608 0.000000
## beta1 0.021175 0.043050 0.49188 0.622804
## shape 99.999556 70.368010 1.42109 0.155289
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.116391 0.002521 46.16956 0.000000
## omega 0.000051 0.000015 3.34398 0.000826
## alpha1 0.977824 0.052135 18.75546 0.000000
## beta1 0.021175 0.038591 0.54871 0.583201
## shape 99.999556 10.484237 9.53809 0.000000
##
## LogLikelihood : 756.3023
##
## Information Criteria
## ------------------------------------
##
## Akaike -4.1508
## Bayes -4.0971
## Shibata -4.1512
## Hannan-Quinn -4.1295
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 248.9 0
## Lag[2*(p+q)+(p+q)-1][2] 345.9 0
## Lag[4*(p+q)+(p+q)-1][5] 597.8 0
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 4.891 0.02700
## Lag[2*(p+q)+(p+q)-1][5] 7.625 0.03648
## Lag[4*(p+q)+(p+q)-1][9] 8.266 0.11428
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.2124 0.500 2.000 0.6449
## ARCH Lag[5] 1.0171 1.440 1.667 0.7279
## ARCH Lag[7] 1.0949 2.315 1.543 0.8976
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 10.7911
## Individual Statistics:
## mu 4.1788
## omega 0.4136
## alpha1 0.2436
## beta1 0.6172
## shape 8.4986
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.4952 0.1357
## Negative Sign Bias 0.5157 0.6064
## Positive Sign Bias 0.6158 0.5384
## Joint Effect 2.7621 0.4298
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 325.8 9.816e-58
## 2 30 358.8 1.509e-58
## 3 40 356.7 6.333e-53
## 4 50 390.5 9.785e-55
##
##
## Elapsed time : 0.1400781
#Time Series based on volatility or Variance based on a standard Garch [1,1] model
sugar_vol <-ts(sugar_garch2@fit$sigma^2,start=c(1993),frequency=12)
plot(sugar_vol,xlab="",ylab="",main="Sugar_Volatility (GARCH[1,1])")

#Exponential GARCH (does not work quite as well)
Egarch_model <- ugarchspec(variance.model = list(model = "eGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(1,0), include.mean = TRUE), distribution.model = "std")
sugar_egarch2 <- ugarchfit(spec = garch_model, data = sugar_ts)
sugar_egarch2
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.116391 0.001092 106.62385 0.000000
## omega 0.000051 0.000012 4.22398 0.000024
## alpha1 0.977824 0.094878 10.30608 0.000000
## beta1 0.021175 0.043050 0.49188 0.622804
## shape 99.999556 70.368010 1.42109 0.155289
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.116391 0.002521 46.16956 0.000000
## omega 0.000051 0.000015 3.34398 0.000826
## alpha1 0.977824 0.052135 18.75546 0.000000
## beta1 0.021175 0.038591 0.54871 0.583201
## shape 99.999556 10.484237 9.53809 0.000000
##
## LogLikelihood : 756.3023
##
## Information Criteria
## ------------------------------------
##
## Akaike -4.1508
## Bayes -4.0971
## Shibata -4.1512
## Hannan-Quinn -4.1295
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 248.9 0
## Lag[2*(p+q)+(p+q)-1][2] 345.9 0
## Lag[4*(p+q)+(p+q)-1][5] 597.8 0
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 4.891 0.02700
## Lag[2*(p+q)+(p+q)-1][5] 7.625 0.03648
## Lag[4*(p+q)+(p+q)-1][9] 8.266 0.11428
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.2124 0.500 2.000 0.6449
## ARCH Lag[5] 1.0171 1.440 1.667 0.7279
## ARCH Lag[7] 1.0949 2.315 1.543 0.8976
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 10.7911
## Individual Statistics:
## mu 4.1788
## omega 0.4136
## alpha1 0.2436
## beta1 0.6172
## shape 8.4986
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.4952 0.1357
## Negative Sign Bias 0.5157 0.6064
## Positive Sign Bias 0.6158 0.5384
## Joint Effect 2.7621 0.4298
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 325.8 9.816e-58
## 2 30 358.8 1.509e-58
## 3 40 356.7 6.333e-53
## 4 50 390.5 9.785e-55
##
##
## Elapsed time : 0.151495
## mu omega alpha1 beta1 shape
## 1.163906e-01 5.093451e-05 9.778237e-01 2.117538e-02 9.999956e+01
sugar_forecast_garche <- ugarchforecast(sugar_egarch2, n.ahead = horizon)
forecast_mean_sugare <- as.numeric(sugar_forecast_garche@forecast$seriesFor)
actual_values_sugare <- as.numeric(window(sugar_vol, start = c(1993, 1)))
#Time Series based on volatility or Variance based on a standard Garch [1,1] model
esugar_vol <-ts(sugar_egarch2@fit$sigma^2,start=c(1993),frequency=12)
#plot(esugar_vol,xlab="",ylab="",main="Sugar_Volatility (eGARCH[1,1])")
cor(sugar_vol, esugar_vol)
## [1] 1
ts.plot(sugar_vol,esugar_vol,col=c("green","red"),xlab="")
legend("topright",legend=c("Standard","Exponential"),col=c("green","red"),lty=c(1,1))

#No difference shown
#GARCH 3
names(sugar_garch2@model)
## [1] "modelinc" "modeldesc" "modeldata" "pars" "start.pars"
## [6] "fixed.pars" "maxOrder" "pos.matrix" "fmodel" "pidx"
## [11] "n.start"
## [1] "hessian" "cvar" "var" "sigma"
## [5] "condH" "z" "LLH" "log.likelihoods"
## [9] "residuals" "coef" "robust.cvar" "A"
## [13] "B" "scores" "se.coef" "tval"
## [17] "matcoef" "robust.se.coef" "robust.tval" "robust.matcoef"
## [21] "fitted.values" "convergence" "kappa" "persistence"
## [25] "timer" "ipars" "solver"
#Variance
sugar_garch_var <- sugar_garch2@fit$var
#Residuals
sugar_garch_res <- (sugar_garch2@fit$residuals)^2
#Plotting residuals and conditional variances
plot(sugar_garch_res, type = "l")
lines(sugar_garch_var, col = "green")
sugar_forecast_garch2 <- ugarchforecast(sugar_garch2, n.ahead = 12)
sugar_forecast_garch2
##
## *------------------------------------*
## * GARCH Model Forecast *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
##
## 0-roll forecast [T0=Feb 2023]:
## Series Sigma
## T+1 0.1164 0.09664
## T+2 0.1164 0.09685
## T+3 0.1164 0.09707
## T+4 0.1164 0.09728
## T+5 0.1164 0.09749
## T+6 0.1164 0.09770
## T+7 0.1164 0.09792
## T+8 0.1164 0.09813
## T+9 0.1164 0.09834
## T+10 0.1164 0.09855
## T+11 0.1164 0.09876
## T+12 0.1164 0.09896
sugar_forecast_values <- as.numeric(sugar_forecast_garch2@forecast$series)
print(sugar_forecast_values)
## [1] 0.1163906 0.1163906 0.1163906 0.1163906 0.1163906 0.1163906 0.1163906
## [8] 0.1163906 0.1163906 0.1163906 0.1163906 0.1163906
sugar_forecast_index <- sugar_forecast_garch2@forecast$seriesFor
sugar_forecast <- data.frame(time = sugar_forecast_index, forecast = sugar_forecast_values)
print(sugar_forecast)
## Feb.2023 forecast
## T+1 0.1163906 0.1163906
## T+2 0.1163906 0.1163906
## T+3 0.1163906 0.1163906
## T+4 0.1163906 0.1163906
## T+5 0.1163906 0.1163906
## T+6 0.1163906 0.1163906
## T+7 0.1163906 0.1163906
## T+8 0.1163906 0.1163906
## T+9 0.1163906 0.1163906
## T+10 0.1163906 0.1163906
## T+11 0.1163906 0.1163906
## T+12 0.1163906 0.1163906
#write.csv(sugar_forecast, file="sugar_forecast.csv",row.names = FALSE)
sugar_garch2_fitted <- fitted(sugar_garch2)
#print(sugar_garch2_fitted)
#print(sugar_forecast_values)
#summary(sugar_forecast_garch2)
ug_sugar <- sugar_forecast_garch2@forecast$sigmaFor
#plot(ug_sugar, type = "l")
sug_var_t <- c(tail(sugar_garch_var,20),rep(NA,10)) # gets the last 20 observations
sug_res_t <- c(tail(sugar_garch_res,20),rep(NA,10)) # gets the last 20 observations
sug_f <- c(rep(NA,20),(ug_sugar)^2)
#plot(sug_res_t, type = "l") #Residuals
lines(sug_f, col = "orange") # Predictions
lines(sug_var_t, col = "green") #Conditional Forecast

#Plot Predictions
sug_mean_forecast <- as.numeric(sugar_forecast_garch2@forecast$seriesFor)
# Get the upper and lower confidence intervals for both 95% and 80%
sug_conf_int_95 <- as.numeric(sugar_forecast_garch2@forecast$upper[, "95%"]) # 95% confidence interval
sug_conf_int_80 <- as.numeric(sugar_forecast_garch2@forecast$upper[, "80%"]) # 80% confidence interval
# Plot the mean forecasted values with the two confidence intervals
#plot(sugar_forecast_garch2, main = "Forecasted coffee Prices (GARCH(1,1))")
Sugar Model
Comparison
#ARIMA Models
forecast::accuracy(sugar_fit)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0003955322 0.01061327 0.007561439 -0.0316936 5.710755 0.2410328
## ACF1
## Training set -0.001346786
forecast::accuracy(sugar_fit2)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0005395214 0.01060471 0.007528664 0.009326486 5.68556 0.2399881
## ACF1
## Training set -0.006954922
forecast::accuracy(auto_sugar)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0002209403 0.0104548 0.007462305 0.04185269 5.656067 0.2378728
## ACF1
## Training set -0.005221658
forecast::accuracy(sugar_fitR)
## ME RMSE MAE MPE MAPE MASE
## Training set -7.134492e-05 0.01132439 0.008501457 0.02670897 6.032758 0.3216336
## ACF1
## Training set -0.2904199
forecast::accuracy(auto_sugarR)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0007288199 0.007660754 0.005699677 0.3909611 3.96685 0.2156345
## ACF1
## Training set -0.04453725
## [1] -2244.505
## [1] -2221.172
#GARCH Model
actual_values_sugar <- as.numeric(window(sugar_ts))
actual_values_sugar <- head(actual_values_sugar, length(forecast_mean_sugar))
mae <- mean(abs(forecast_mean_sugar - actual_values_sugar))
mse <- mean((forecast_mean_sugar - actual_values_sugar)^2)
rmse <- sqrt(mse)
# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE: 0.0221829354226844
cat(paste("MSE: ", mse, "\n"))
## MSE: 0.000625796954725992
cat(paste("RMSE: ", rmse, "\n"))
## RMSE: 0.0250159340166621
#GARCH Model 2
forecast_mean_sugar2 <- as.numeric(sugar_forecast_garch2@forecast$seriesFor)
actual_values_sugar2 <- as.numeric(window(sugar_ts))
actual_values_sugar2 <- head(actual_values_sugar2, length(forecast_mean_sugar2))
sug_mae <- mean(abs(forecast_mean_sugar2 - actual_values_sugar2))
sug_mse <- mean((forecast_mean_sugar2 - actual_values_sugar2)^2)
sug_rmse <- sqrt(sug_mse)
# Print the results
cat(paste("MAE: ", sug_mae, "\n"))
## MAE: 0.014777997147131
cat(paste("MSE: ", sug_mse, "\n"))
## MSE: 0.000302609589889141
cat(paste("RMSE: ", sug_rmse, "\n"))
## RMSE: 0.0173956773334395
#eGARCH
forecast_mean_sugare <- as.numeric(sugar_forecast_garche@forecast$seriesFor)
actual_values_sugare <- as.numeric(window(sugar_ts))
actual_values_sugare <- head(actual_values_sugar2, length(forecast_mean_sugare))
esug_mae <- mean(abs(forecast_mean_sugare - actual_values_sugare))
esug_mse <- mean((forecast_mean_sugare - actual_values_sugare)^2)
esug_rmse <- sqrt(sug_mse)
# Print the results
cat(paste("MAE: ", esug_mae, "\n"))
## MAE: 0.0221829354226844
cat(paste("MSE: ", esug_mse, "\n"))
## MSE: 0.000625796954725992
cat(paste("RMSE: ", esug_rmse, "\n"))
## RMSE: 0.0173956773334395
Exploring Coffee
# Looking at the price of Coffee over the last 30 years
ggplot(coffee, aes(date, value)) +
geom_line() +
labs(title = "Revenue by Year")

#Summary of coffee
summary(coffee)
## date value month
## Min. :1993-01-04 Min. :0.4250 Length:7582
## 1st Qu.:2000-07-31 1st Qu.:0.9995 Class :character
## Median :2008-03-12 Median :1.2070 Mode :character
## Mean :2008-02-23 Mean :1.2816
## 3rd Qu.:2015-09-17 3rd Qu.:1.5115
## Max. :2023-02-17 Max. :3.1480
##Significant changes in price
#Coffee monthly average
coffee_m <- coffee %>% group_by(month) %>% mutate(mon_avg = mean(value))%>%
select(month, mon_avg)
#Drop Duplicate rows
coffee_m <- coffee_m[!duplicated(coffee_m),]
#Creating Time series data
coffee_ts <- ts(coffee_m$mon_avg,start=c(1993),frequency=12)
#Plotting
chartSeries(coffee_ts)

## Significant increases in price in the mid 90's, 2010-2012 and the 2020's
#Looking at trends
autoplot(decompose((coffee_ts)),main="")

#Coffee looking for daily or weekly trends
coffee_r <- coffee %>%
filter(date >= as.Date('2022-11-01') & date <= as.Date('2023-01-31'))
ggplot(coffee_r, aes(date, value)) +
geom_line() +
labs(title = "Coffee Prices Daily") + xlab("time") + ylab("prices")

## No trends by day of week seen
#Seasonal Plots
#Format back to date in aggregated month column
coffee_m %>%
mutate(month = ym(month)) -> Coffee_mon
#Convert to tibble
Coffee_mon <- as_tibble(Coffee_mon)
#convert data frame into time series tsibble object
Coffee_mon %>% as_tsibble(index = month) -> Coffee_mon_ts
#Format data for Coffee seasonal plots
Coffee_mon_ts %>%
mutate(Month = tsibble::yearmonth(month)) %>%
as_tsibble(index = Month) %>%
dplyr::select(Month,mon_avg) -> Coffee_sea_ts
#Coffee seasonal plot by month
autoplot(Coffee_sea_ts, mon_avg) +
ylab("monthly Coffee prices") +
xlab("")

#Different view Coffee seasonal plot
Coffee_sea_ts %>% gg_season(mon_avg, labels = "both") +
ylab("Monthly Coffee prices")

#Coffee seasonal subseries plot
Coffee_sea_ts %>% gg_subseries(mon_avg) +
labs(y = "Coffee prices", title = "Seasonal subseries plot: Coffee prices")

Coffee Arima
##Stationary Test
adf.test(coffee_ts, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: coffee_ts
## Dickey-Fuller = -3.2651, Lag order = 7, p-value = 0.07714
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(coffee_ts), alternative ="stationary")
## Warning in adf.test(diff(coffee_ts), alternative = "stationary"): p-value
## smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(coffee_ts)
## Dickey-Fuller = -5.9334, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(coffee_ts),main='')

#q=4
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(coffee_ts),main='')

#p=4. There are 4 partial auto corelation values
# ARIMA Custom (best fit)
coffee_fit<- Arima(coffee_ts, order=c(4,1,4))
coffee_fit
## Series: coffee_ts
## ARIMA(4,1,4)
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## 0.5954 0.0770 -0.9527 0.5519 -0.4654 -0.0217 0.9312 -0.4830
## s.e. 0.2993 0.0552 0.0632 0.2614 0.3128 0.0566 0.0562 0.2851
##
## sigma^2 = 0.01271: log likelihood = 279.38
## AIC=-540.75 AICc=-540.24 BIC=-505.75
# ARIMA Alternate Custom
coffee_fit2<- Arima(coffee_ts, order=c(4,1,5))
coffee_fit2
## Series: coffee_ts
## ARIMA(4,1,5)
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## -0.1247 0.6060 0.2163 0.1729 0.2508 -0.4745 -0.2219 -0.3044
## s.e. 0.2759 0.2615 0.2585 0.2339 0.2706 0.2508 0.2373 0.2231
## ma5
## -0.2263
## s.e. 0.0604
##
## sigma^2 = 0.01249: log likelihood = 282.85
## AIC=-545.69 AICc=-545.07 BIC=-506.81
#Check residuals
checkresiduals(coffee_fit)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,1,4)
## Q* = 18.839, df = 16, p-value = 0.2771
##
## Model df: 8. Total lags used: 24
#Auto-fit Arima
auto_coffee<- auto.arima(coffee_ts)
auto_coffee
## Series: coffee_ts
## ARIMA(0,1,3)
##
## Coefficients:
## ma1 ma2 ma3
## 0.1286 0.1538 0.0843
## s.e. 0.0521 0.0550 0.0520
##
## sigma^2 = 0.01278: log likelihood = 276.15
## AIC=-544.3 AICc=-544.19 BIC=-528.74
##Forecast Plots
##Forecast Custom
autoplot(forecast::forecast(coffee_fit, h=12, level=c(80,95)))

##Forecast Custom 2
autoplot(forecast::forecast(coffee_fit2, h=12, level=c(80,95)))

##Forecast Auto
autoplot(forecast::forecast(auto_coffee, h=12, level=c(80,95)))

#ARIMA using more recent data
coffee_r2 <- coffee %>%
filter(date >= as.Date('2010-01-01') & date <= as.Date('2023-01-31'))
coffee_r2 <- coffee_r2 %>% group_by(month) %>% mutate(mon_avg = mean(value))%>%
select(month, mon_avg)
#Drop Duplicate rows Coffee
coffee_r2 <- coffee_r2[!duplicated(coffee_r2),]
coffee_tsr <- ts(coffee_r2$mon_avg,start=c(2010),frequency=12)
##Stationary Test
adf.test(coffee_tsr, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: coffee_tsr
## Dickey-Fuller = -2.949, Lag order = 5, p-value = 0.1807
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(coffee_tsr), alternative ="stationary")
##
## Augmented Dickey-Fuller Test
##
## data: diff(coffee_tsr)
## Dickey-Fuller = -3.3631, Lag order = 5, p-value = 0.06326
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(coffee_tsr),main='')

#q=6
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(coffee_tsr),main='')

#p=2. There are 2 partial auto correlation values
# ARIMA Custom
coffee_fitR<- Arima(coffee_tsr, order=c(2,1,6))
coffee_fitR
## Series: coffee_tsr
## ARIMA(2,1,6)
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3 ma4 ma5 ma6
## -0.1766 0.5327 0.4315 -0.4325 -0.1015 0.058 0.1799 0.3457
## s.e. 0.1415 0.1311 0.1478 0.1425 0.0938 0.084 0.1008 0.0898
##
## sigma^2 = 0.01069: log likelihood = 136.07
## AIC=-254.13 AICc=-252.9 BIC=-226.68
#Check residuals
checkresiduals(coffee_fitR)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,6)
## Q* = 18.687, df = 16, p-value = 0.2853
##
## Model df: 8. Total lags used: 24
##Forecast Custom
autoplot(forecast::forecast(coffee_fitR, h=12, level=c(80,95)))

Coffee GARCH
# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data
coffee_garch <- ugarchfit(spec = garch_model, data = coffee_ts)
coffee_vol <-ts(coffee_garch@fit$sigma^2,start=c(1993),frequency=12)
print(coffee_garch)
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 1.163668 0.012917 90.0876 0.000000
## omega 0.006320 0.001610 3.9246 0.000087
## alpha1 0.963469 0.103053 9.3493 0.000000
## beta1 0.035531 0.051711 0.6871 0.492023
## shape 41.707308 39.423928 1.0579 0.290093
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 1.163668 0.033799 34.42924 0.000000
## omega 0.006320 0.002121 2.97960 0.002886
## alpha1 0.963469 0.079086 12.18254 0.000000
## beta1 0.035531 0.049899 0.71205 0.476434
## shape 41.707308 54.714844 0.76227 0.445901
##
## LogLikelihood : -47.06522
##
## Information Criteria
## ------------------------------------
##
## Akaike 0.28765
## Bayes 0.34141
## Shibata 0.28728
## Hannan-Quinn 0.30902
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 216.3 0
## Lag[2*(p+q)+(p+q)-1][2] 295.2 0
## Lag[4*(p+q)+(p+q)-1][5] 490.1 0
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 4.550 0.03291
## Lag[2*(p+q)+(p+q)-1][5] 4.743 0.17488
## Lag[4*(p+q)+(p+q)-1][9] 4.993 0.43046
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.04656 0.500 2.000 0.8292
## ARCH Lag[5] 0.38595 1.440 1.667 0.9166
## ARCH Lag[7] 0.51618 2.315 1.543 0.9769
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 2.9057
## Individual Statistics:
## mu 2.03276
## omega 0.03716
## alpha1 0.45032
## beta1 0.20222
## shape 0.12263
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.3570 0.1756
## Negative Sign Bias 0.2075 0.8357
## Positive Sign Bias 0.8410 0.4009
## Joint Effect 2.4352 0.4871
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 277.3 8.600e-48
## 2 30 326.2 5.166e-52
## 3 40 323.3 1.835e-46
## 4 50 338.8 5.870e-45
##
##
## Elapsed time : 0.214422
#plot(coffee_garch, which = 1)
coef(coffee_garch)
## mu omega alpha1 beta1 shape
## 1.163667853 0.006319659 0.963469397 0.035530613 41.707307511
# Forecasting
horizon <- 3
coffee_forecast_garch <- ugarchforecast(coffee_garch, n.ahead = horizon)
forecast_mean_coffee <- as.numeric(coffee_forecast_garch@forecast$seriesFor)
actual_values_coffee <- as.numeric(window(coffee_vol, start = c(1993, 1)))
#plot(coffee_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single",
# main = "GARCH Forecast for coffee Prices", ylab = "Price", xlab = "Time") #%>%
#lines(coffee_ts[(length(coffee_ts)-horizon+1):length(coffee_ts)], col = "blue")
#Based on signficance. Let's try auto_arimas
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(1,0), include.mean = TRUE), distribution.model = "std")
coffee_garch2 <- ugarchfit(spec = garch_model, data = coffee_ts)
coffee_garch2
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.685232 0.069468 9.8640 0.000000
## ar1 0.979045 0.008883 110.2205 0.000000
## omega 0.001102 0.000596 1.8498 0.064345
## alpha1 0.313564 0.100026 3.1348 0.001720
## beta1 0.652411 0.098098 6.6506 0.000000
## shape 4.793099 1.230126 3.8964 0.000098
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.685232 0.030481 22.4807 0.000000
## ar1 0.979045 0.012618 77.5882 0.000000
## omega 0.001102 0.000740 1.4897 0.136295
## alpha1 0.313564 0.112132 2.7964 0.005168
## beta1 0.652411 0.110367 5.9113 0.000000
## shape 4.793099 1.329729 3.6046 0.000313
##
## LogLikelihood : 336.509
##
## Information Criteria
## ------------------------------------
##
## Akaike -1.8260
## Bayes -1.7615
## Shibata -1.8266
## Hannan-Quinn -1.8004
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 6.856 8.833e-03
## Lag[2*(p+q)+(p+q)-1][2] 9.614 3.734e-09
## Lag[4*(p+q)+(p+q)-1][5] 12.488 8.611e-05
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.0524 0.81894
## Lag[2*(p+q)+(p+q)-1][5] 7.1030 0.04896
## Lag[4*(p+q)+(p+q)-1][9] 8.9404 0.08360
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.1976 0.500 2.000 0.6566
## ARCH Lag[5] 1.0086 1.440 1.667 0.7304
## ARCH Lag[7] 1.7858 2.315 1.543 0.7625
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 1.1403
## Individual Statistics:
## mu 0.4216
## ar1 0.1803
## omega 0.1394
## alpha1 0.1449
## beta1 0.1563
## shape 0.3658
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.50505 0.13320
## Negative Sign Bias 0.62289 0.53375
## Positive Sign Bias 0.02833 0.97741
## Joint Effect 7.06926 0.06972 *
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 29.27 0.06184
## 2 30 35.46 0.18988
## 3 40 38.22 0.50522
## 4 50 46.29 0.58376
##
##
## Elapsed time : 0.1508482
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.685232 0.069468 9.8640 0.000000
## ar1 0.979045 0.008883 110.2205 0.000000
## omega 0.001102 0.000596 1.8498 0.064345
## alpha1 0.313564 0.100026 3.1348 0.001720
## beta1 0.652411 0.098098 6.6506 0.000000
## shape 4.793099 1.230126 3.8964 0.000098
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.685232 0.030481 22.4807 0.000000
## ar1 0.979045 0.012618 77.5882 0.000000
## omega 0.001102 0.000740 1.4897 0.136295
## alpha1 0.313564 0.112132 2.7964 0.005168
## beta1 0.652411 0.110367 5.9113 0.000000
## shape 4.793099 1.329729 3.6046 0.000313
##
## LogLikelihood : 336.509
##
## Information Criteria
## ------------------------------------
##
## Akaike -1.8260
## Bayes -1.7615
## Shibata -1.8266
## Hannan-Quinn -1.8004
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 6.856 8.833e-03
## Lag[2*(p+q)+(p+q)-1][2] 9.614 3.734e-09
## Lag[4*(p+q)+(p+q)-1][5] 12.488 8.611e-05
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.0524 0.81894
## Lag[2*(p+q)+(p+q)-1][5] 7.1030 0.04896
## Lag[4*(p+q)+(p+q)-1][9] 8.9404 0.08360
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.1976 0.500 2.000 0.6566
## ARCH Lag[5] 1.0086 1.440 1.667 0.7304
## ARCH Lag[7] 1.7858 2.315 1.543 0.7625
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 1.1403
## Individual Statistics:
## mu 0.4216
## ar1 0.1803
## omega 0.1394
## alpha1 0.1449
## beta1 0.1563
## shape 0.3658
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.50505 0.13320
## Negative Sign Bias 0.62289 0.53375
## Positive Sign Bias 0.02833 0.97741
## Joint Effect 7.06926 0.06972 *
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 29.27 0.06184
## 2 30 35.46 0.18988
## 3 40 38.22 0.50522
## 4 50 46.29 0.58376
##
##
## Elapsed time : 0.1508482
#Time Series based on volatility or Variance based on a standard Garch [1,1] model
coffee_vol <-ts(coffee_garch2@fit$sigma^2,start=c(1993),frequency=12)
plot(coffee_vol,xlab="",ylab="",main="coffee_Volatility (GARCH[1,1])")
#GARCH 3
names(coffee_garch2@model)
## [1] "modelinc" "modeldesc" "modeldata" "pars" "start.pars"
## [6] "fixed.pars" "maxOrder" "pos.matrix" "fmodel" "pidx"
## [11] "n.start"
## [1] "hessian" "cvar" "var" "sigma"
## [5] "condH" "z" "LLH" "log.likelihoods"
## [9] "residuals" "coef" "robust.cvar" "A"
## [13] "B" "scores" "se.coef" "tval"
## [17] "matcoef" "robust.se.coef" "robust.tval" "robust.matcoef"
## [21] "fitted.values" "convergence" "kappa" "persistence"
## [25] "timer" "ipars" "solver"
#Variance
coffee_garch_var <- coffee_garch2@fit$var
#Residuals
coffee_garch_res <- (coffee_garch2@fit$residuals)^2
#Plotting residuals and conditional variances
#plot(coffee_garch_res, type = "l")
lines(coffee_garch_var, col = "green")

coffee_forecast_garch2 <- ugarchforecast(coffee_garch2, n.ahead = 12)
coffee_forecast_garch2
##
## *------------------------------------*
## * GARCH Model Forecast *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
##
## 0-roll forecast [T0=Feb 2023]:
## Series Sigma
## T+1 1.753 0.1716
## T+2 1.731 0.1719
## T+3 1.709 0.1721
## T+4 1.687 0.1724
## T+5 1.666 0.1727
## T+6 1.646 0.1729
## T+7 1.626 0.1732
## T+8 1.606 0.1734
## T+9 1.587 0.1736
## T+10 1.568 0.1739
## T+11 1.549 0.1741
## T+12 1.531 0.1743
ug_coffee <- coffee_forecast_garch2@forecast$sigmaFor
plot(ug_coffee, type = "l")

coffee_var_t <- c(tail(coffee_garch_var,20),rep(NA,10)) # gets the last 20 observations
coffee_res_t <- c(tail(coffee_garch_res,20),rep(NA,10)) # gets the last 20 observations
coffee_f <- c(rep(NA,20),(ug_coffee)^2)
plot(coffee_res_t, type = "l") #Residuals
lines(coffee_f, col = "orange") # Predictions
lines(coffee_var_t, col = "green") #Conditional Forecast

#Plot Predictions
#plot(coffee_forecast_garch2, main = "Forecasted coffee Prices (GARCH(1,1))")
#legend("bottomright", legend = c("Mean", "Lower 95% CI", "Upper 95% CI"), col = c("black", "blue", "red"), lty = 1)
coffee_mean_forecast <- as.numeric(coffee_forecast_garch2@forecast$seriesFor)
# Plot the mean forecasted values with the two confidence intervals
#plot(coffee_forecast_garch2, main = "Forecasted coffee Prices (GARCH(1,1))")
Coffee Model
Comparison
#ARIMA Models
forecast::accuracy(coffee_fit)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.002422129 0.1113441 0.07793565 -0.008238448 6.054242 0.2064315
## ACF1
## Training set -0.007994279
forecast::accuracy(coffee_fit2)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.00694441 0.1102114 0.07716969 0.03490727 5.963348 0.2044027
## ACF1
## Training set -0.003319408
forecast::accuracy(auto_coffee)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.002408695 0.1124349 0.07772975 -0.0001296227 6.015002 0.2058861
## ACF1
## Training set -0.0005311462
forecast::accuracy(coffee_fitR)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0001687003 0.1003911 0.07550881 -0.001675586 4.964701 0.1660969
## ACF1
## Training set 0.0005172509
## [1] -540.751
## [1] -505.7511
## [1] -545.6944
## [1] -506.8056
## [1] -254.1334
## [1] -226.6847
#GARCH Model
actual_values_coffee <- as.numeric(window(coffee_ts))
actual_values_coffee <- head(actual_values_coffee, length(forecast_mean_coffee))
mae <- mean(abs(forecast_mean_coffee - actual_values_coffee))
mse <- mean((forecast_mean_coffee - actual_values_coffee)^2)
rmse <- sqrt(mse)
# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE: 0.525243672859614
cat(paste("MSE: ", mse, "\n"))
## MSE: 0.27628538402018
cat(paste("RMSE: ", rmse, "\n"))
## RMSE: 0.52562856088704
#GARCH Model 2
forecast_mean_coffee2 <- as.numeric(coffee_forecast_garch2@forecast$seriesFor)
actual_values_coffee2 <- as.numeric(window(coffee_ts))
actual_values_coffee2 <- head(actual_values_coffee2, length(forecast_mean_coffee2))
coffee_mae <- mean(abs(forecast_mean_coffee2 - actual_values_coffee2))
coffee_mse <- mean((forecast_mean_coffee2 - actual_values_coffee2)^2)
coffee_rmse <- sqrt(coffee_mse)
# Print the results
cat(paste("MAE: ", coffee_mae, "\n"))
## MAE: 0.952327512190766
cat(paste("MSE: ", coffee_mse, "\n"))
## MSE: 0.925844918652174
cat(paste("RMSE: ", coffee_rmse, "\n"))
## RMSE: 0.962208355114512
Exploring
Aluminum
#Looking at the price of Aluminum over the last 30 years
ggplot(Aluminum, aes(date, Price)) +
geom_line() +
labs(title = "Revenue by Day")

#Summary Aluminum
summary(Aluminum)
## Price date month
## Min. : 0 Min. :2002-10-23 Length:4953
## 1st Qu.:1770 1st Qu.:2008-05-30 Class :character
## Median :1971 Median :2013-04-24 Mode :character
## Mean :2071 Mean :2013-04-10
## 3rd Qu.:2390 3rd Qu.:2018-03-15
## Max. :3849 Max. :2023-02-16
## Low price is less than half of highest cost.
#Aluminum monthly average
alum_m <- Aluminum %>% group_by(month) %>% mutate(mon_avg = mean(Price))%>%
select(month, mon_avg)
#Drop Duplicate rows Aluminum
alum_m <- alum_m[!duplicated(alum_m),]
#Creating Time series data Aluminum
alum_ts <- ts(alum_m$mon_avg,start=c(2002),frequency=12)
#Plotting Aluminum
chartSeries(alum_ts)

#Huge spike in price in early 2000's unique among commodities
#Looking at trends
autoplot(decompose((alum_ts)),main="")

#Looking for daily or weekly trends Aluminum
Aluminum_r <- Aluminum %>%
filter(date >= as.Date('2022-12-31') & date <= as.Date('2023-01-31'))
ggplot(Aluminum_r, aes(date, Price)) +
geom_line() +
labs(title = "Aluminum Prices Daily") + xlab("time") + ylab("prices")

## No weekly trends shown through plotting
#Aluminum Prices Controlling for inflation
alum_CPI <- read.csv("alum_cpi.csv")
alum_adj <- left_join(alum_m, alum_CPI, by = c("month" = "date"))
alum_adj$adj_price <- alum_adj$mon_avg / alum_adj$Value
alum_ts_adj <- ts(alum_adj$adj_price, start = c(2003), frequency = 12)
chartSeries(alum_ts)

ggplot(data = alum_adj, aes(x = month, y = adj_price, group = 1)) +
geom_line() +
labs(x = "Month", y = "Adjusted alum Price", title = "alum Prices Adjusted for Inflation")
## Warning: Removed 3 rows containing missing values (`geom_line()`).

#Inflation vs. Price without scaling
ggplot(alum_adj, aes(x = month)) +
geom_line(aes(y = Value, color = "Inflation Rate", group = 1)) +
geom_line(aes(y = mon_avg, color = "alum Price", group = 1)) +
scale_color_manual(values = c("blue", "red")) +
xlab("Month") +
ylab("Value") +
ggtitle("Inflation Rate and alum Prices over Time")
## Warning: Removed 3 rows containing missing values (`geom_line()`).

# Scaling the data
alum_adj$norm_value <- scale(alum_adj$Value)
alum_adj$norm_mon_avg <- scale(alum_adj$mon_avg)
# Calculate the adjusted alum price by dividing the monthly average alum price by the CPI value
# Plot the normalized values for 'Value' and 'mon_avg' and the adjusted alum price as lines
alum_adj$month <- as.Date(paste0(alum_adj$month, "-01"))
ggplot(alum_adj, aes(x = month)) +
geom_line(aes(y = norm_value, color = "CPI", group = 1)) +
geom_line(aes(y = norm_mon_avg, color = "alum Price", group = 1)) +
scale_color_manual(values = c("darkblue", "orange")) +
xlab("Date") +
ylab("Normalized Values") +
ggtitle("Inflation Rate vs. Price - Aluminum") +
scale_x_date(date_labels = "%Y")
## Warning: Removed 3 rows containing missing values (`geom_line()`).

#Seasonal Plots
#Format back to date in aggregated month column
alum_m %>%
mutate(month = ym(month)) -> Alum_mon
#Convert to tibble
Alum_mon <- as_tibble(Alum_mon)
#convert data frame into time series tsibble object
Alum_mon %>% as_tsibble(index = month) -> Alum_mon_ts
#Format data for Aluminum seasonal plots
Alum_mon_ts %>%
mutate(Month = tsibble::yearmonth(month)) %>%
as_tsibble(index = Month) %>%
dplyr::select(Month,mon_avg) -> Alum_sea_ts
#Aluminum seasonal plot by month
autoplot(Alum_sea_ts, mon_avg) +
ylab("monthly Aluminum prices") +
xlab("")

#Different view Aluminum seasonal plot
Alum_sea_ts %>% gg_season(mon_avg, labels = "both") +
ylab("Monthly Aluminum prices")

#Aluminum seasonal subseries plot
Alum_sea_ts %>% gg_subseries(mon_avg) +
labs(y = "Aluminum prices", title = "Seasonal subseries plot: Aluminum prices")

Aluminum ARIMA
##Stationary Test
adf.test(alum_ts, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: alum_ts
## Dickey-Fuller = -2.6022, Lag order = 6, p-value = 0.3224
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(alum_ts), alternative ="stationary")
## Warning in adf.test(diff(alum_ts), alternative = "stationary"): p-value smaller
## than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(alum_ts)
## Dickey-Fuller = -6.0687, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(alum_ts),main='')

#q=3
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(alum_ts),main='')

#p=3. There are 3 partial auto correlation values
# ARIMA Custom (best fit)
aluminum_fit<- Arima(alum_ts, order=c(3,1,3))
aluminum_fit
## Series: alum_ts
## ARIMA(3,1,3)
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3
## 1.1325 -1.2485 0.3465 -0.8396 1.0051 -0.0065
## s.e. 0.1714 0.1397 0.1696 0.1818 0.1572 0.1835
##
## sigma^2 = 11595: log likelihood = -1487.02
## AIC=2988.03 AICc=2988.51 BIC=3012.51
# ARIMA Alternate Custom
aluminum_fit2<- Arima(alum_ts, order=c(3,1,2))
aluminum_fit2
## Series: alum_ts
## ARIMA(3,1,2)
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2
## -0.3140 0.4679 -0.0601 0.6432 -0.3134
## s.e. 1.1728 0.7328 0.3159 1.1674 1.1311
##
## sigma^2 = 11831: log likelihood = -1488.07
## AIC=2988.13 AICc=2988.49 BIC=3009.11
#Check residuals
checkresiduals(aluminum_fit)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,1,3)
## Q* = 23.776, df = 18, p-value = 0.1625
##
## Model df: 6. Total lags used: 24
#Auto-fit Arima
auto_aluminum<- auto.arima(alum_ts)
auto_aluminum
## Series: alum_ts
## ARIMA(2,0,1)(2,0,1)[12] with non-zero mean
##
## Coefficients:
## ar1 ar2 ma1 sar1 sar2 sma1 mean
## 1.3678 -0.4051 -0.0960 -0.1301 -0.1949 0.0914 2039.9543
## s.e. 0.2242 0.2184 0.2492 0.3281 0.0694 0.3354 128.7307
##
## sigma^2 = 11389: log likelihood = -1490.23
## AIC=2996.45 AICc=2997.06 BIC=3024.46
##Forecast Plots
##Forecast Custom
autoplot(forecast::forecast(aluminum_fit, h=12, level=c(80,95)))

##Forecast Custom 2
autoplot(forecast::forecast(aluminum_fit2, h=12, level=c(80,95)))

##Forecast Auto
autoplot(forecast::forecast(auto_aluminum, h=12, level=c(80,95)))

#ARIMA using more recent data
aluminum_r2 <- Aluminum %>%
filter(date >= as.Date('2019-01-01') & date <= as.Date('2023-01-31'))
aluminum_r2 <- aluminum_r2 %>% group_by(month) %>% mutate(mon_avg = mean(Price))%>%
select(month, mon_avg)
#Drop Duplicate rows Aluminum
aluminum_r2 <- aluminum_r2[!duplicated(aluminum_r2),]
alum_tsr <- ts(aluminum_r2$mon_avg,start=c(2019),frequency=12)
##Stationary Test
adf.test(alum_tsr, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: alum_tsr
## Dickey-Fuller = -2.2624, Lag order = 3, p-value = 0.4692
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(alum_tsr), alternative ="stationary")
##
## Augmented Dickey-Fuller Test
##
## data: diff(alum_tsr)
## Dickey-Fuller = -3.464, Lag order = 3, p-value = 0.0574
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(alum_tsr),main='')

#q=1
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(alum_tsr),main='')

#p=2. There are 2 partial auto correlation values
# ARIMA Custom
aluminum_fitR<- Arima(alum_tsr, order=c(2,1,1))
aluminum_fitR
## Series: alum_tsr
## ARIMA(2,1,1)
##
## Coefficients:
## ar1 ar2 ma1
## -0.2423 0.1572 0.8567
## s.e. 0.1952 0.1712 0.1305
##
## sigma^2 = 15008: log likelihood = -297.69
## AIC=603.39 AICc=604.32 BIC=610.87
#Check residuals
checkresiduals(aluminum_fitR)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,1)
## Q* = 11.527, df = 7, p-value = 0.1172
##
## Model df: 3. Total lags used: 10
##Forecast Custom
autoplot(forecast::forecast(aluminum_fitR, h=12, level=c(80,95)))

#Printing Predictions
aluminum_predictions <- forecast::forecast(aluminum_fitR,h=12)
print(aluminum_predictions$mean)
## Jan Feb Mar Apr May Jun Jul Aug
## 2023 1888.621 1880.203 1885.581 1882.954 1884.437 1883.664 1884.085
## 2024 1883.933
## Sep Oct Nov Dec
## 2023 1883.861 1883.982 1883.917 1883.952
## 2024
Aluminum GARCH
# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data
alum_fit <- ugarchfit(spec = garch_model, data = alum_ts)
print(alum_fit)
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 1.8323e+03 1.4411e+01 127.1517 0.000000
## omega 2.6125e+03 1.0188e+03 2.5644 0.010336
## alpha1 9.3815e-01 1.1299e-01 8.3027 0.000000
## beta1 6.0847e-02 5.9512e-02 1.0224 0.306580
## shape 1.0000e+02 7.5167e+01 1.3304 0.183393
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 1.8323e+03 3.9258e+01 46.6743 0.000000
## omega 2.6125e+03 1.1162e+03 2.3406 0.019254
## alpha1 9.3815e-01 7.6338e-02 12.2894 0.000000
## beta1 6.0847e-02 8.1280e-02 0.7486 0.454095
## shape 1.0000e+02 5.8292e+00 17.1549 0.000000
##
## LogLikelihood : -1715.166
##
## Information Criteria
## ------------------------------------
##
## Akaike 14.042
## Bayes 14.114
## Shibata 14.041
## Hannan-Quinn 14.071
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 169.4 0
## Lag[2*(p+q)+(p+q)-1][2] 234.1 0
## Lag[4*(p+q)+(p+q)-1][5] 381.1 0
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 18.16 2.032e-05
## Lag[2*(p+q)+(p+q)-1][5] 18.92 3.828e-05
## Lag[4*(p+q)+(p+q)-1][9] 20.28 1.702e-04
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.05877 0.500 2.000 0.8085
## ARCH Lag[5] 1.01577 1.440 1.667 0.7283
## ARCH Lag[7] 1.80840 2.315 1.543 0.7578
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 29.6025
## Individual Statistics:
## mu 0.21228
## omega 0.08013
## alpha1 0.84836
## beta1 0.15114
## shape 14.97564
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.3182 0.1887
## Negative Sign Bias 0.2831 0.7773
## Positive Sign Bias 1.3221 0.1874
## Joint Effect 2.6093 0.4559
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 177.4 9.826e-28
## 2 30 193.2 3.444e-26
## 3 40 200.2 1.498e-23
## 4 50 235.2 3.769e-26
##
##
## Elapsed time : 0.2433379
#plot(alum_fit, which = 1)
coef(alum_fit)
## mu omega alpha1 beta1 shape
## 1.832340e+03 2.612539e+03 9.381534e-01 6.084659e-02 1.000000e+02
# Forecasting
horizon <- 3
alum_forecast_garch <- ugarchforecast(alum_fit, n.ahead = horizon)
forecast_mean_alum <- as.numeric(alum_forecast_garch@forecast$seriesFor)
actual_values_alum <- as.numeric(window(alum_ts, start = c(1993, 1)))
## Warning in window.default(x, ...): 'start' value not changed
#plot(alum_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single",
# main = "GARCH Forecast for alum Prices", ylab = "Price", xlab = "Time") #%>%
#lines(alum_ts[(length(alum_ts)-horizon+1):length(alum_ts)], col = "blue")
#Based on signficance. Let's try auto_arimas
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(1,0), include.mean = TRUE), distribution.model = "std")
alum_garch2 <- ugarchfit(spec = garch_model, data = alum_ts)
alum_garch2
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 2481.61680 103.331950 24.0160 0.000000
## ar1 1.00000 0.007311 136.7822 0.000000
## omega 485.14727 338.064049 1.4351 0.151266
## alpha1 0.24396 0.077693 3.1401 0.001689
## beta1 0.73411 0.067507 10.8745 0.000000
## shape 8.58310 4.196208 2.0454 0.040811
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 2481.61680 31.379364 79.0844 0.000000
## ar1 1.00000 0.010177 98.2653 0.000000
## omega 485.14727 354.651855 1.3680 0.171327
## alpha1 0.24396 0.066384 3.6750 0.000238
## beta1 0.73411 0.048017 15.2887 0.000000
## shape 8.58310 3.421984 2.5082 0.012134
##
## LogLikelihood : -1464.623
##
## Information Criteria
## ------------------------------------
##
## Akaike 12.005
## Bayes 12.091
## Shibata 12.004
## Hannan-Quinn 12.040
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 8.385 3.784e-03
## Lag[2*(p+q)+(p+q)-1][2] 9.719 2.831e-09
## Lag[4*(p+q)+(p+q)-1][5] 10.983 4.068e-04
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.0000346 0.9953
## Lag[2*(p+q)+(p+q)-1][5] 0.1087920 0.9978
## Lag[4*(p+q)+(p+q)-1][9] 0.7211680 0.9950
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.1206 0.500 2.000 0.7284
## ARCH Lag[5] 0.1928 1.440 1.667 0.9672
## ARCH Lag[7] 0.2565 2.315 1.543 0.9949
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 1.4826
## Individual Statistics:
## mu 0.70646
## ar1 0.18760
## omega 0.09967
## alpha1 0.10988
## beta1 0.14469
## shape 0.20105
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.5388 0.5905
## Negative Sign Bias 1.4627 0.1449
## Positive Sign Bias 0.9102 0.3636
## Joint Effect 4.0548 0.2556
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 27.57 0.09203
## 2 30 42.14 0.05452
## 3 40 42.51 0.32235
## 4 50 56.43 0.21707
##
##
## Elapsed time : 0.1771231
alum_vol <-ts(alum_garch2@fit$sigma^2,start=c(1993),frequency=12)
print(alum_garch2)
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 2481.61680 103.331950 24.0160 0.000000
## ar1 1.00000 0.007311 136.7822 0.000000
## omega 485.14727 338.064049 1.4351 0.151266
## alpha1 0.24396 0.077693 3.1401 0.001689
## beta1 0.73411 0.067507 10.8745 0.000000
## shape 8.58310 4.196208 2.0454 0.040811
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 2481.61680 31.379364 79.0844 0.000000
## ar1 1.00000 0.010177 98.2653 0.000000
## omega 485.14727 354.651855 1.3680 0.171327
## alpha1 0.24396 0.066384 3.6750 0.000238
## beta1 0.73411 0.048017 15.2887 0.000000
## shape 8.58310 3.421984 2.5082 0.012134
##
## LogLikelihood : -1464.623
##
## Information Criteria
## ------------------------------------
##
## Akaike 12.005
## Bayes 12.091
## Shibata 12.004
## Hannan-Quinn 12.040
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 8.385 3.784e-03
## Lag[2*(p+q)+(p+q)-1][2] 9.719 2.831e-09
## Lag[4*(p+q)+(p+q)-1][5] 10.983 4.068e-04
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.0000346 0.9953
## Lag[2*(p+q)+(p+q)-1][5] 0.1087920 0.9978
## Lag[4*(p+q)+(p+q)-1][9] 0.7211680 0.9950
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.1206 0.500 2.000 0.7284
## ARCH Lag[5] 0.1928 1.440 1.667 0.9672
## ARCH Lag[7] 0.2565 2.315 1.543 0.9949
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 1.4826
## Individual Statistics:
## mu 0.70646
## ar1 0.18760
## omega 0.09967
## alpha1 0.10988
## beta1 0.14469
## shape 0.20105
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.5388 0.5905
## Negative Sign Bias 1.4627 0.1449
## Positive Sign Bias 0.9102 0.3636
## Joint Effect 4.0548 0.2556
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 27.57 0.09203
## 2 30 42.14 0.05452
## 3 40 42.51 0.32235
## 4 50 56.43 0.21707
##
##
## Elapsed time : 0.1771231
#Time Series based on volatility or Variance based on a standard Garch [1,1] model
alum_vol <-ts(alum_garch2@fit$sigma^2,start=c(1993),frequency=12)
#plot(alum_vol,xlab="",ylab="",main="alum_Volatility (GARCH[1,1])")
#GARCH 3
names(alum_garch2@model)
## [1] "modelinc" "modeldesc" "modeldata" "pars" "start.pars"
## [6] "fixed.pars" "maxOrder" "pos.matrix" "fmodel" "pidx"
## [11] "n.start"
## [1] "hessian" "cvar" "var" "sigma"
## [5] "condH" "z" "LLH" "log.likelihoods"
## [9] "residuals" "coef" "robust.cvar" "A"
## [13] "B" "scores" "se.coef" "tval"
## [17] "matcoef" "robust.se.coef" "robust.tval" "robust.matcoef"
## [21] "fitted.values" "convergence" "kappa" "persistence"
## [25] "timer" "ipars" "solver"
#Variance
alum_garch_var <- alum_garch2@fit$var
#Residuals
alum_garch_res <- (alum_garch2@fit$residuals)^2
#Plotting residuals and conditional variances
plot(alum_garch_res, type = "l")
lines(alum_garch_var, col = "green")

alum_forecast_garch2 <- ugarchforecast(alum_garch2, n.ahead = 12)
alum_forecast_garch2
##
## *------------------------------------*
## * GARCH Model Forecast *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
##
## 0-roll forecast [T0=May 2022]:
## Series Sigma
## T+1 1338 51.44
## T+2 1338 55.44
## T+3 1338 59.08
## T+4 1338 62.45
## T+5 1338 65.57
## T+6 1338 68.48
## T+7 1338 71.22
## T+8 1338 73.80
## T+9 1338 76.24
## T+10 1338 78.55
## T+11 1338 80.74
## T+12 1338 82.84
ug_alum <- alum_forecast_garch2@forecast$sigmaFor
plot(ug_alum, type = "l")

alum_var_t <- c(tail(alum_garch_var,20),rep(NA,10)) # gets the last 20 observations
alum_res_t <- c(tail(alum_garch_res,20),rep(NA,10)) # gets the last 20 observations
alum_f <- c(rep(NA,20),(ug_alum)^2)
plot(alum_res_t, type = "l") #Residuals
lines(alum_f, col = "orange") # Predictions
lines(alum_var_t, col = "green") #Conditional Forecast

#Plot Predictions
#plot(alum_forecast_garch2, main = "Forecasted alum Prices (GARCH(1,1))")
#legend("bottomright", legend = c("Mean", "Lower 95% CI", "Upper 95% CI"), col = c("black", "blue", "red"), lty = 1)
alum_mean_forecast <- as.numeric(alum_forecast_garch2@forecast$seriesFor)
Model Comparison
Alum
#ARIMA Models
forecast::accuracy(aluminum_fit)
## ME RMSE MAE MPE MAPE MASE
## Training set -3.178964 106.1314 77.14823 -0.225094 3.592564 0.2066838
## ACF1
## Training set -0.001764145
forecast::accuracy(aluminum_fit2)
## ME RMSE MAE MPE MAPE MASE
## Training set -3.223632 107.4307 79.04678 -0.2338294 3.705623 0.2117701
## ACF1
## Training set -0.000467848
forecast::accuracy(auto_aluminum)
## ME RMSE MAE MPE MAPE MASE
## Training set -1.456311 105.1855 78.26729 -0.3282913 3.671699 0.2096818
## ACF1
## Training set 0.003850494
forecast::accuracy(aluminum_fitR)
## ME RMSE MAE MPE MAPE MASE
## Training set -7.505676 117.4017 87.07036 -0.3747184 3.776837 0.1736574
## ACF1
## Training set 0.02022298
## [1] 2988.035
## [1] 3012.515
## [1] 2988.132
## [1] 3009.115
## [1] 603.3853
## [1] 610.8701
#GARCH Model
actual_values_alum <- head(actual_values_alum, length(forecast_mean_alum))
mae <- mean(abs(forecast_mean_alum - actual_values_alum))
mse <- mean((forecast_mean_alum - actual_values_alum)^2)
rmse <- sqrt(mse)
# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE: 652.061576157405
cat(paste("MSE: ", mse, "\n"))
## MSE: 426606.693369426
cat(paste("RMSE: ", rmse, "\n"))
## RMSE: 653.151355636216
#GARCH Model 2
forecast_mean_alum2 <- as.numeric(alum_forecast_garch2@forecast$seriesFor)
actual_values_alum2 <- as.numeric(window(alum_ts, start = c(1993, 1)))
## Warning in window.default(x, ...): 'start' value not changed
actual_values_alum2 <- head(actual_values_alum2, length(forecast_mean_alum2))
alum_mae <- mean(abs(forecast_mean_alum2 - actual_values_alum2))
alum_mse <- mean((forecast_mean_alum2 - actual_values_alum2)^2)
alum_rmse <- sqrt(alum_mse)
# Print the results
cat(paste("MAE: ", alum_mae, "\n"))
## MAE: 1276.58131835304
cat(paste("MSE: ", alum_mse, "\n"))
## MSE: 1775413.44915285
cat(paste("RMSE: ", alum_rmse, "\n"))
## RMSE: 1332.44641511501
Exploring Soybean
# Looking at the price of Soybean over the last 30 years
ggplot(soybean, aes(date, value)) +
geom_line() +
labs(title = "Revenue by Day")

#Summary
summary(soybean)
## date value month
## Min. :1993-01-04 Min. : 4.100 Length:7599
## 1st Qu.:2000-07-13 1st Qu.: 5.940 Class :character
## Median :2008-02-05 Median : 8.645 Mode :character
## Mean :2008-02-02 Mean : 8.930
## 3rd Qu.:2015-08-20 3rd Qu.:10.707
## Max. :2023-02-17 Max. :17.690
## Price has increased significantly over 30 years, outpacing inflation.
#Soybean monthly average
soy_m <- soybean %>% group_by(month) %>% mutate(mon_avg = mean(value))%>%
select(month, mon_avg)
#Drop Duplicate rows Soybean
soy_m <- soy_m[!duplicated(soy_m),]
#Creating Time series data Soybean
soy_ts <- ts(soy_m$mon_avg,start=c(1993),frequency=12)
#Plotting Soybean
chartSeries(soy_ts)

## Price jumps in 2008, 2010 and 2020's.
#Looking at trends
autoplot(decompose((soy_ts)),main="")

#Looking for daily or weekly trends Soybean
soybean_r <- soybean %>%
filter(date >= as.Date('2022-11-01') & date <= as.Date('2022-12-01'))
ggplot(soybean_r, aes(date, value)) +
geom_line() +
labs(title = "Soybean Prices Daily") + xlab("time") + ylab("prices")

## No daily trends detected
#Seasonal Plots
#Format back to date in aggregated month column
soy_m %>%
mutate(month = ym(month)) -> Soy_mon
#Convert to tibble
Soy_mon <- as_tibble(Soy_mon)
#convert data frame into time series tsibble object
Soy_mon %>% as_tsibble(index = month) -> Soy_mon_ts
#Format data for Soybean seasonal plots
Soy_mon_ts %>%
mutate(Month = tsibble::yearmonth(month)) %>%
as_tsibble(index = Month) %>%
dplyr::select(Month,mon_avg) -> Soy_sea_ts
#Soybean seasonal plot by month
autoplot(Soy_sea_ts, mon_avg) +
ylab("monthly Soybean prices") +
xlab("")

#Different view Soybean seasonal plot
Soy_sea_ts %>% gg_season(mon_avg, labels = "both") +
ylab("Monthly Soybean prices")

#Soybean seasonal subseries plot
Soy_sea_ts %>% gg_subseries(mon_avg) +
labs(y = "Soybean prices", title = "Seasonal subseries plot: Soybean prices")

Soybean ARIMA
##Stationary Test
adf.test(soy_ts, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: soy_ts
## Dickey-Fuller = -2.4842, Lag order = 7, p-value = 0.3725
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(soy_ts), alternative ="stationary")
## Warning in adf.test(diff(soy_ts), alternative = "stationary"): p-value smaller
## than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(soy_ts)
## Dickey-Fuller = -8.898, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(soy_ts),main='')

#q=5
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(soy_ts),main='')

#p=3. There are 3 partial auto correlation values
# ARIMA Custom(best fit)
soy_fit<- Arima(soy_ts, order=c(3,1,5))
soy_fit
## Series: soy_ts
## ARIMA(3,1,5)
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 ma4 ma5
## 0.7419 -0.8797 0.6695 -0.3814 0.7881 -0.5498 -0.1477 -0.2369
## s.e. 0.1721 0.0712 0.1508 0.1745 0.0789 0.1561 0.0695 0.0592
##
## sigma^2 = 0.2638: log likelihood = -268.02
## AIC=554.05 AICc=554.56 BIC=589.05
#Check residuals
checkresiduals(soy_fit)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,1,5)
## Q* = 19.568, df = 16, p-value = 0.2403
##
## Model df: 8. Total lags used: 24
#Auto-fit Arima
auto_soy<- auto.arima(soy_ts)
auto_soy
## Series: soy_ts
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## 0.3588
## s.e. 0.0490
##
## sigma^2 = 0.2724: log likelihood = -277.08
## AIC=558.16 AICc=558.2 BIC=565.94
##Forecast Plots
##Forecast Custom
autoplot(forecast::forecast(soy_fit, h=12, level=c(80,95)))

##Forecast Auto (overfit)
autoplot(forecast::forecast(auto_soy, h=12, level=c(80,95)))

#ARIMA using more recent data
soy_r2 <- soybean %>%
filter(date >= as.Date('2019-01-01') & date <= as.Date('2023-01-31'))
soy_r2 <- soy_r2 %>% group_by(month) %>% mutate(mon_avg = mean(value))%>%
select(month, mon_avg)
#Drop Duplicate rows soy
soy_r2 <- soy_r2[!duplicated(soy_r2),]
soy_tsr <- ts(soy_r2$mon_avg,start=c(2019),frequency=12)
##Stationary Test
adf.test(soy_tsr, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: soy_tsr
## Dickey-Fuller = -3.7175, Lag order = 3, p-value = 0.03277
## alternative hypothesis: stationary
#Low p-value already
## After first-order differencing
adf.test(diff(soy_tsr), alternative ="stationary")
##
## Augmented Dickey-Fuller Test
##
## data: diff(soy_tsr)
## Dickey-Fuller = -3.0875, Lag order = 3, p-value = 0.1398
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#d=0
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(soy_tsr),main='')

#q=6
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(soy_tsr),main='')

#p=1. There are 2 partial auto correlation values
# ARIMA Custom(better model)
soy_fitR<- Arima(soy_tsr, order=c(1,0,6))
soy_fitR
## Series: soy_tsr
## ARIMA(1,0,6) with non-zero mean
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4 ma5 ma6 mean
## 0.853 0.4663 0.5158 0.5678 0.6085 0.2040 0.3551 12.2780
## s.e. 0.087 0.1555 0.1633 0.1643 0.1593 0.1782 0.1849 1.5803
##
## sigma^2 = 0.3012: log likelihood = -38.46
## AIC=94.91 AICc=99.53 BIC=111.94
#Check residuals
checkresiduals(soy_fitR)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,6) with non-zero mean
## Q* = 1.591, df = 3, p-value = 0.6614
##
## Model df: 7. Total lags used: 10
##Forecast Custom
autoplot(forecast::forecast(soy_fitR, h=12, level=c(80,95)))

#Printing Predictions
soy_predictions <- forecast::forecast(soy_fitR,h=12)
print(soy_predictions$mean)
## Jan Feb Mar Apr May Jun Jul Aug
## 2023 15.62904 15.89738 15.84372 16.14463 15.71070 15.45801 14.99059
## 2024 13.50309
## Sep Oct Nov Dec
## 2023 14.59188 14.25177 13.96165 13.71418
## 2024
print(soy_predictions$mean)
## Jan Feb Mar Apr May Jun Jul Aug
## 2023 15.62904 15.89738 15.84372 16.14463 15.71070 15.45801 14.99059
## 2024 13.50309
## Sep Oct Nov Dec
## 2023 14.59188 14.25177 13.96165 13.71418
## 2024
Soy Garch
# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data
soy_garch <- ugarchfit(spec = garch_model, data = soy_ts)
print(soy_garch)
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 5.784714 0.048936 118.209863 0.000000
## omega 0.057952 0.024057 2.408937 0.015999
## alpha1 0.999000 0.084219 11.861904 0.000000
## beta1 0.000000 0.004047 0.000002 0.999998
## shape 99.999633 65.539550 1.525791 0.127062
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 5.784714 0.106745 54.192013 0.00000
## omega 0.057952 0.052313 1.107794 0.26795
## alpha1 0.999000 0.059852 16.691208 0.00000
## beta1 0.000000 0.004606 0.000002 1.00000
## shape 99.999633 15.751209 6.348696 0.00000
##
## LogLikelihood : -786.6785
##
## Information Criteria
## ------------------------------------
##
## Akaike 4.3739
## Bayes 4.4277
## Shibata 4.3735
## Hannan-Quinn 4.3953
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 245.5 0
## Lag[2*(p+q)+(p+q)-1][2] 340.8 0
## Lag[4*(p+q)+(p+q)-1][5] 598.0 0
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.8803 0.3481
## Lag[2*(p+q)+(p+q)-1][5] 1.7840 0.6703
## Lag[4*(p+q)+(p+q)-1][9] 2.9473 0.7676
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.4557 0.500 2.000 0.4996
## ARCH Lag[5] 1.6494 1.440 1.667 0.5539
## ARCH Lag[7] 1.9878 2.315 1.543 0.7199
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 13.5366
## Individual Statistics:
## mu 0.8402
## omega 0.0176
## alpha1 2.4346
## beta1 1.0869
## shape 3.8682
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 2.2671 0.02398 **
## Negative Sign Bias 0.6849 0.49388
## Positive Sign Bias 1.9662 0.05005 *
## Joint Effect 7.4019 0.06013 *
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 994.5 8.063e-199
## 2 30 1169.8 9.517e-228
## 3 40 1150.3 7.081e-216
## 4 50 1180.0 6.322e-215
##
##
## Elapsed time : 0.1912479
#plot(soy_garch, which = 1)
coef(soy_garch)
## mu omega alpha1 beta1 shape
## 5.784714e+00 5.795219e-02 9.989997e-01 8.449528e-09 9.999963e+01
# Forecasting
horizon <- 3
soy_forecast_garch <- ugarchforecast(soy_garch, n.ahead = horizon)
forecast_mean_soy <- as.numeric(soy_forecast_garch@forecast$seriesFor)
actual_values_soy <- as.numeric(window(soy_ts, start = c(1993, 1)))
#plot(soy_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single",
# main = "GARCH Forecast for soy Prices", ylab = "Price", xlab = "Time") #%>%
#lines(soy_ts[(length(soy_ts)-horizon+1):length(soy_ts)], col = "blue")
#Based on signficance. Let's try auto_arimas
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(1,0), include.mean = TRUE), distribution.model = "std")
soy_garch2 <- ugarchfit(spec = garch_model, data = soy_ts)
soy_garch2
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 5.795659 0.274167 21.1392 0.000000
## ar1 0.998623 0.006327 157.8332 0.000000
## omega 0.047614 0.017129 2.7797 0.005441
## alpha1 0.539779 0.175683 3.0725 0.002123
## beta1 0.459221 0.107228 4.2827 0.000018
## shape 3.807732 0.792698 4.8035 0.000002
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 5.795659 0.033196 174.5904 0.000000
## ar1 0.998623 0.006172 161.8113 0.000000
## omega 0.047614 0.016961 2.8073 0.004996
## alpha1 0.539779 0.159444 3.3854 0.000711
## beta1 0.459221 0.135799 3.3816 0.000721
## shape 3.807732 0.697483 5.4592 0.000000
##
## LogLikelihood : -241.1973
##
## Information Criteria
## ------------------------------------
##
## Akaike 1.3657
## Bayes 1.4302
## Shibata 1.3652
## Hannan-Quinn 1.3914
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 23.05 1.580e-06
## Lag[2*(p+q)+(p+q)-1][2] 23.34 0.000e+00
## Lag[4*(p+q)+(p+q)-1][5] 24.86 1.004e-10
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 1.465 0.2261
## Lag[2*(p+q)+(p+q)-1][5] 2.010 0.6160
## Lag[4*(p+q)+(p+q)-1][9] 4.450 0.5143
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.2109 0.500 2.000 0.6461
## ARCH Lag[5] 0.4451 1.440 1.667 0.8997
## ARCH Lag[7] 1.3605 2.315 1.543 0.8489
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 1.4111
## Individual Statistics:
## mu 0.28744
## ar1 0.03118
## omega 0.30380
## alpha1 0.39278
## beta1 0.25368
## shape 0.69652
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.0402 0.2990
## Negative Sign Bias 0.6755 0.4998
## Positive Sign Bias 1.3498 0.1779
## Joint Effect 3.1373 0.3709
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 25.51 0.14432
## 2 30 45.57 0.02589
## 3 40 48.83 0.13453
## 4 50 64.52 0.06770
##
##
## Elapsed time : 0.1643391
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 5.795659 0.274167 21.1392 0.000000
## ar1 0.998623 0.006327 157.8332 0.000000
## omega 0.047614 0.017129 2.7797 0.005441
## alpha1 0.539779 0.175683 3.0725 0.002123
## beta1 0.459221 0.107228 4.2827 0.000018
## shape 3.807732 0.792698 4.8035 0.000002
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 5.795659 0.033196 174.5904 0.000000
## ar1 0.998623 0.006172 161.8113 0.000000
## omega 0.047614 0.016961 2.8073 0.004996
## alpha1 0.539779 0.159444 3.3854 0.000711
## beta1 0.459221 0.135799 3.3816 0.000721
## shape 3.807732 0.697483 5.4592 0.000000
##
## LogLikelihood : -241.1973
##
## Information Criteria
## ------------------------------------
##
## Akaike 1.3657
## Bayes 1.4302
## Shibata 1.3652
## Hannan-Quinn 1.3914
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 23.05 1.580e-06
## Lag[2*(p+q)+(p+q)-1][2] 23.34 0.000e+00
## Lag[4*(p+q)+(p+q)-1][5] 24.86 1.004e-10
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 1.465 0.2261
## Lag[2*(p+q)+(p+q)-1][5] 2.010 0.6160
## Lag[4*(p+q)+(p+q)-1][9] 4.450 0.5143
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.2109 0.500 2.000 0.6461
## ARCH Lag[5] 0.4451 1.440 1.667 0.8997
## ARCH Lag[7] 1.3605 2.315 1.543 0.8489
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 1.4111
## Individual Statistics:
## mu 0.28744
## ar1 0.03118
## omega 0.30380
## alpha1 0.39278
## beta1 0.25368
## shape 0.69652
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.0402 0.2990
## Negative Sign Bias 0.6755 0.4998
## Positive Sign Bias 1.3498 0.1779
## Joint Effect 3.1373 0.3709
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 25.51 0.14432
## 2 30 45.57 0.02589
## 3 40 48.83 0.13453
## 4 50 64.52 0.06770
##
##
## Elapsed time : 0.1643391
#Time Series based on volatility or Variance based on a standard Garch [1,1] model
soy_vol <-ts(soy_garch2@fit$sigma^2,start=c(1993),frequency=12)
#plot(soy_vol,xlab="",ylab="",main="soy_Volatility (GARCH[1,1])")
#GARCH 3
names(soy_garch2@model)
## [1] "modelinc" "modeldesc" "modeldata" "pars" "start.pars"
## [6] "fixed.pars" "maxOrder" "pos.matrix" "fmodel" "pidx"
## [11] "n.start"
## [1] "hessian" "cvar" "var" "sigma"
## [5] "condH" "z" "LLH" "log.likelihoods"
## [9] "residuals" "coef" "robust.cvar" "A"
## [13] "B" "scores" "se.coef" "tval"
## [17] "matcoef" "robust.se.coef" "robust.tval" "robust.matcoef"
## [21] "fitted.values" "convergence" "kappa" "persistence"
## [25] "timer" "ipars" "solver"
#Variance
soy_garch_var <- soy_garch2@fit$var
#Residuals
soy_garch_res <- (soy_garch2@fit$residuals)^2
#Plotting residuals and conditional variances
plot(soy_garch_res, type = "l")
lines(soy_garch_var, col = "green")

soy_forecast_garch2 <- ugarchforecast(soy_garch2, n.ahead = 12)
soy_forecast_garch2
##
## *------------------------------------*
## * GARCH Model Forecast *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
##
## 0-roll forecast [T0=Feb 2023]:
## Series Sigma
## T+1 15.27 0.4542
## T+2 15.25 0.5037
## T+3 15.24 0.5487
## T+4 15.23 0.5902
## T+5 15.21 0.6290
## T+6 15.20 0.6655
## T+7 15.19 0.7000
## T+8 15.18 0.7329
## T+9 15.16 0.7644
## T+10 15.15 0.7945
## T+11 15.14 0.8236
## T+12 15.12 0.8516
ug_soy <- soy_forecast_garch2@forecast$sigmaFor
#plot(ug_soy, type = "l")
soy_var_t <- c(tail(soy_garch_var,20),rep(NA,10)) # gets the last 20 observations
soy_res_t <- c(tail(soy_garch_res,20),rep(NA,10)) # gets the last 20 observations
soy_f <- c(rep(NA,20),(ug_soy)^2)
plot(soy_res_t, type = "l") #Residuals
lines(soy_f, col = "orange") # Predictions
lines(soy_var_t, col = "green") #Conditional Forecast
legend("topright",
legend = c("Residuals", "Predictions", "Conditional Forecast"),
col = c("black", "orange", "green"),
lty = c(1, 1, 1),
cex = 0.8)

#Plot Predictions
#plot(soy_forecast_garch2, main = "Forecasted soy Prices (GARCH(1,1))")
#legend("bottomright", legend = c("Mean", "Lower 95% CI", "Upper 95% CI"), col = c("black", "blue", "red"), lty = 1)
soy_mean_forecast <- as.numeric(soy_forecast_garch2@forecast$seriesFor)
# Plot the mean forecasted values with the two confidence intervals
#plot(soy_forecast_garch2, main = "Forecasted soy Prices (GARCH(1,1))")
Soybean Model
Comparison
#ARIMA Models
forecast::accuracy(soy_fit)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02571002 0.5071781 0.3576524 0.1286753 3.985675 0.2222804
## ACF1
## Training set -0.00724677
forecast::accuracy(auto_soy)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01700553 0.5204963 0.3658424 0.109258 4.049421 0.2273705
## ACF1
## Training set 0.003829795
forecast::accuracy(soy_fitR)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.03335124 0.5020207 0.4066544 0.03725125 3.276699 0.1832952
## ACF1
## Training set 0.00698056
## [1] 554.0485
## [1] 589.0484
## [1] 94.91211
## [1] 111.9385
#GARCH Model
actual_values_soy <- head(actual_values_soy, length(forecast_mean_soy))
mae <- mean(abs(forecast_mean_soy - actual_values_soy))
mse <- mean((forecast_mean_soy - actual_values_soy)^2)
rmse <- sqrt(mse)
# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE: 0.0434758867098116
cat(paste("MSE: ", mse, "\n"))
## MSE: 0.00269589445359164
cat(paste("RMSE: ", rmse, "\n"))
## RMSE: 0.0519220035591043
#GARCH Model 2
forecast_mean_soy2 <- as.numeric(soy_forecast_garch2@forecast$seriesFor)
actual_values_soy2 <- as.numeric(window(soy_ts, start = c(1993, 1)))
actual_values_soy2 <- head(actual_values_soy2, length(forecast_mean_soy2))
soy_mae <- mean(abs(forecast_mean_soy2 - actual_values_soy2))
soy_mse <- mean((forecast_mean_soy2 - actual_values_soy2)^2)
soy_rmse <- sqrt(soy_mse)
# Print the results
cat(paste("MAE: ", soy_mae, "\n"))
## MAE: 8.9266536433183
cat(paste("MSE: ", soy_mse, "\n"))
## MSE: 79.9110106038158
cat(paste("RMSE: ", soy_rmse, "\n"))
## RMSE: 8.93929586733853
Exploring Soybean
Oil
# Looking at the price of Soybean Oil over the last 30 years
ggplot(soybeanoil, aes(date, price)) +
geom_line() +
labs(title = "Revenue by Day")

#Summary
summary(soybeanoil)
## date price month
## Min. :1993-01-04 Min. :0.1441 Length:7428
## 1st Qu.:2000-05-11 1st Qu.:0.2359 Class :character
## Median :2007-10-02 Median :0.2877 Mode :character
## Mean :2007-10-04 Mean :0.3235
## 3rd Qu.:2015-02-26 3rd Qu.:0.3757
## Max. :2022-06-30 Max. :0.8820
## All time high price in June of 2022
#Soybeanoil monthly average
soyo_m <- soybeanoil %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)
#Drop Duplicate rows Soybean Oil
soyo_m <- soyo_m[!duplicated(soyo_m),]
#Creating Time series data Soybean oil
soyo_ts <- ts(soyo_m$mon_avg,start=c(1993),frequency=12)
#Plotting Soybean Oil
chartSeries(soyo_ts)

## Price hikes in 2008, 2012 and 2020's
#Looking at trends
autoplot(decompose((soyo_ts)),main="")

#Looking for daily or weekly trends Soybean oil
soyo_r <- soybeanoil %>%
filter(date >= as.Date('2022-04-01') & date <= as.Date('2023-06-01'))
ggplot(soyo_r, aes(date, price)) +
geom_line() +
labs(title = "Soybean Oil Prices Daily") + xlab("time") + ylab("prices")

# No price trends by day seen
#Seasonal Plots
#Format back to date in aggregated month column
soyo_m %>%
mutate(month = ym(month)) -> SoyO_mon
#Convert to tibble
SoyO_mon <- as_tibble(SoyO_mon)
#convert data frame into time series tsibble object
SoyO_mon %>% as_tsibble(index = month) -> SoyO_mon_ts
#Format data for Soybean Oil seasonal plots
SoyO_mon_ts %>%
mutate(Month = tsibble::yearmonth(month)) %>%
as_tsibble(index = Month) %>%
dplyr::select(Month,mon_avg) -> SoyO_sea_ts
#Soybean Oil seasonal plot by month
autoplot(SoyO_sea_ts, mon_avg) +
ylab("monthly Soybean Oil prices") +
xlab("")

#Different view Soybean Oil seasonal plot
SoyO_sea_ts %>% gg_season(mon_avg, labels = "both") +
ylab("Monthly Soybean Oil prices")

#Soybean Oil seasonal subseries plot
SoyO_sea_ts %>% gg_subseries(mon_avg) +
labs(y = "Soybean Oil prices", title = "Seasonal subseries plot: Soybean Oil prices")

Soybean Oil ARIMA
models
##Stationary Test
adf.test(soyo_ts, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: soyo_ts
## Dickey-Fuller = -1.9416, Lag order = 7, p-value = 0.6014
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(soyo_ts), alternative ="stationary")
## Warning in adf.test(diff(soyo_ts), alternative = "stationary"): p-value smaller
## than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(soyo_ts)
## Dickey-Fuller = -7.4399, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(soyo_ts),main='')

#q=4
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(soyo_ts),main='')

#p=3. There are 3 partial auto correlation values
# ARIMA Custom(best fit)
soyo_fit<- Arima(soyo_ts, order=c(3,1,4))
soyo_fit
## Series: soyo_ts
## ARIMA(3,1,4)
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 ma4
## 0.8973 -0.1191 -0.4720 -0.4854 -0.0275 0.3970 0.3082
## s.e. 0.1993 0.2881 0.1848 0.1921 0.2209 0.1205 0.0671
##
## sigma^2 = 0.0003321: log likelihood = 915.98
## AIC=-1815.96 AICc=-1815.54 BIC=-1785.03
#Check residuals
checkresiduals(soyo_fit)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,1,4)
## Q* = 20.652, df = 17, p-value = 0.2423
##
## Model df: 7. Total lags used: 24
#Auto-fit Arima
auto_soyo<- auto.arima(soyo_ts)
auto_soyo
## Series: soyo_ts
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## 0.4295
## s.e. 0.0484
##
## sigma^2 = 0.0003439: log likelihood = 907.13
## AIC=-1810.27 AICc=-1810.23 BIC=-1802.53
##Forecast Plots
##Forecast Custom(better model)
autoplot(forecast::forecast(soyo_fit, h=12, level=c(80,95)))

##Forecast Auto (overfit)
autoplot(forecast::forecast(auto_soyo, h=12, level=c(80,95)))

#ARIMA using more recent data
soyo_r2 <- soybeanoil %>%
filter(date >= as.Date('2016-01-01') & date <= as.Date('2023-01-31'))
soyo_r2 <- soyo_r2 %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)
#Drop Duplicate rows
soyo_r2 <- soyo_r2[!duplicated(soyo_r2),]
soyo_tsr <- ts(soyo_r2$mon_avg,start=c(2016),frequency=12)
##Stationary Test
adf.test(soyo_tsr, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: soyo_tsr
## Dickey-Fuller = -0.68864, Lag order = 4, p-value = 0.9674
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(soyo_tsr), alternative ="stationary")
## Warning in adf.test(diff(soyo_tsr), alternative = "stationary"): p-value smaller
## than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(soyo_tsr)
## Dickey-Fuller = -4.5721, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#d=1
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(soyo_tsr),main='')

#q=1
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(soyo_tsr),main='')

#p=1. There are 2 partial auto correlation values
# ARIMA Custom(better model)
soyo_fitR<- Arima(soyo_tsr, order=c(1,1,1))
soyo_fitR
## Series: soyo_tsr
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## 0.5389 -0.0714
## s.e. 0.1668 0.1882
##
## sigma^2 = 0.0005035: log likelihood = 183.98
## AIC=-361.97 AICc=-361.64 BIC=-354.93
#Check residuals
checkresiduals(soyo_fitR)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1)
## Q* = 12.019, df = 14, p-value = 0.6048
##
## Model df: 2. Total lags used: 16
##Forecast Custom
autoplot(forecast::forecast(soyo_fitR, h=12, level=c(80,95)))

#Printing Predictions
soyo_predictions <- forecast::forecast(soyo_fitR,h=12)
print(soyo_predictions$mean)
## Jan Feb Mar Apr May Jun Jul
## 2022 0.7382803
## 2023 0.7128862 0.7125922 0.7124337 0.7123484 0.7123024 0.7122776
## Aug Sep Oct Nov Dec
## 2022 0.7262769 0.7198084 0.7163225 0.7144440 0.7134317
## 2023
print(soyo_predictions$mean)
## Jan Feb Mar Apr May Jun Jul
## 2022 0.7382803
## 2023 0.7128862 0.7125922 0.7124337 0.7123484 0.7123024 0.7122776
## Aug Sep Oct Nov Dec
## 2022 0.7262769 0.7198084 0.7163225 0.7144440 0.7134317
## 2023
GARCH Soybean Oil
Modeling
# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data
soyo_garch <- ugarchfit(spec = garch_model, data = soyo_ts)
print(soyo_garch)
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 2.654e-01 0.005163 51.4028 0.000000
## omega 1.320e-04 0.000090 1.4727 0.140832
## alpha1 9.990e-01 0.361641 2.7624 0.005738
## beta1 0.000e+00 0.354191 0.0000 1.000000
## shape 1.000e+02 57.346231 1.7438 0.081195
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 2.654e-01 0.033620 7.89424 0.00000
## omega 1.320e-04 0.000465 0.28508 0.77558
## alpha1 9.990e-01 2.128517 0.46934 0.63883
## beta1 0.000e+00 2.119646 0.00000 1.00000
## shape 1.000e+02 2.772805 36.06456 0.00000
##
## LogLikelihood : 470.9922
##
## Information Criteria
## ------------------------------------
##
## Akaike -2.6327
## Bayes -2.5781
## Shibata -2.6331
## Hannan-Quinn -2.6110
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 282.7 0
## Lag[2*(p+q)+(p+q)-1][2] 404.9 0
## Lag[4*(p+q)+(p+q)-1][5] 715.7 0
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 3.581 0.05846
## Lag[2*(p+q)+(p+q)-1][5] 10.024 0.00908
## Lag[4*(p+q)+(p+q)-1][9] 14.070 0.00598
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.4061 0.500 2.000 0.5240
## ARCH Lag[5] 1.4593 1.440 1.667 0.6031
## ARCH Lag[7] 4.0776 2.315 1.543 0.3355
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 80.7041
## Individual Statistics:
## mu 7.8160
## omega 0.5892
## alpha1 0.4907
## beta1 0.1933
## shape 41.3785
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 1.1676 0.2438
## Negative Sign Bias 0.0256 0.9796
## Positive Sign Bias 0.5212 0.6026
## Joint Effect 2.5982 0.4578
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 505.2 4.510e-95
## 2 30 555.2 1.241e-98
## 3 40 537.3 7.159e-89
## 4 50 551.1 4.132e-86
##
##
## Elapsed time : 0.182143
#plot(soyo_garch, which = 1)
coef(soyo_garch)
## mu omega alpha1 beta1 shape
## 2.654041e-01 1.324920e-04 9.989999e-01 1.589806e-08 9.999998e+01
# Forecasting
horizon <- 3
soyo_vol <-ts(soyo_garch@fit$sigma^2,start=c(1993),frequency=12)
soyo_forecast_garch <- ugarchforecast(soyo_garch, n.ahead = 12)
forecast_mean_soyo <- as.numeric(soyo_forecast_garch@forecast$seriesFor)
actual_values_soyo <- as.numeric(window(soyo_ts, start = c(1993, 1)))
#plot(soyo_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single",
#main = "GARCH Forecast for soyo Prices", ylab = "Price", xlab = "Time") #%>%
#lines(soyo_ts[(length(soyo_ts)-horizon+1):length(soyo_ts)], col = "blue")
#Based on signficance. Let's try auto_arimas
garch_model2 <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(1,0), include.mean = TRUE), distribution.model = "std")
soyo_garch2 <- ugarchfit(spec = garch_model2, data = soyo_ts)
## Warning in arima(data, order = c(modelinc[2], 0, modelinc[3]), include.mean =
## modelinc[1], : possible convergence problem: optim gave code = 1
## Warning in .sgarchfit(spec = spec, data = data, out.sample = out.sample, :
## ugarchfit-->warning: solver failer to converge.
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Convergence Problem:
## Solver Message:
#Time Series based on volatility or Variance based on a standard Garch [1,1] model
#soyo_vol2 <-ts(soyo_garch2@fit$sigma^2,start=c(1993),frequency=12)
#plot(soyo_vol2,xlab="",ylab="",main="soyo_Volatility (GARCH[1,1])")
#GARCH 3
names(soyo_garch2@model)
## [1] "modelinc" "modeldesc" "modeldata" "pars" "start.pars"
## [6] "fixed.pars" "maxOrder" "pos.matrix" "fmodel" "pidx"
## [11] "n.start"
## [1] "convergence" "solver"
#Variance
soyo_garch_var <- soyo_garch2@fit$var
#Residuals
soyo_garch_res <- (soyo_garch2@fit$residuals)^2
#Plotting residuals and conditional variances
#plot(soyo_garch_res, type = "l")
#lines(soyo_garch_var, col = "green")
#soyo_forecast_garch2 <- ugarchforecast(soyo_garch2, n.ahead = 12)
#soyo_forecast_garch2
soyobean Oil Model
Comparison
#ARIMA Models
forecast::accuracy(soyo_fit)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0008097532 0.0180158 0.01283394 0.1543794 3.959575 0.1940801
## ACF1
## Training set -0.006675165
forecast::accuracy(auto_soyo)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0008238039 0.01849189 0.01313962 0.1557511 4.030274 0.1987027
## ACF1
## Training set -0.008424143
forecast::accuracy(soyo_fitR)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.002605911 0.02200404 0.01554205 0.5402573 3.714631 0.1878383
## ACF1
## Training set -0.01925028
## [1] -1815.961
## [1] -1785.029
## [1] -361.9659
## [1] -354.9345
#GARCH Model
actual_values_soyo <- head(actual_values_soyo, length(forecast_mean_soyo))
mae <- mean(abs(forecast_mean_soyo - actual_values_soyo))
mse <- mean((forecast_mean_soyo - actual_values_soyo)^2)
rmse <- sqrt(mse)
# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE: 0.0388457606662608
cat(paste("MSE: ", mse, "\n"))
## MSE: 0.0017633098679844
cat(paste("RMSE: ", rmse, "\n"))
## RMSE: 0.041991783338939
Exploring Corn
# Looking at the price of Corn over the last 30 years
ggplot(corn, aes(date, price)) +
geom_line() +
labs(title = "Revenue by Day")

#Summary Corn
summary(corn)
## date price month
## Min. :1993-01-04 Min. :1.748 Length:7605
## 1st Qu.:2000-07-18 1st Qu.:2.365 Class :character
## Median :2008-02-05 Median :3.506 Mode :character
## Mean :2008-02-02 Mean :3.696
## 3rd Qu.:2015-08-20 3rd Qu.:4.210
## Max. :2023-02-17 Max. :8.312
## Corn price reached an all time high in February of 2023
#Corn monthly average
corn_m <- corn %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)
#Drop Duplicate rows Corn
corn_m <- corn_m[!duplicated(corn_m),]
#Creating Time series data Corn
corn_ts <- ts(corn_m$mon_avg,start=c(1993),frequency=12)
#Plotting Corn
chartSeries(corn_ts)

## Price hikes in 1995, 2008, 2010's and 2020's
#Looking at trends
autoplot(decompose((corn_ts)),main="")

#Looking for daily or weekly trends Corn
corn_r <- corn %>%
filter(date >= as.Date('2022-11-01') & date <= as.Date('2023-01-31'))
ggplot(corn_r, aes(date, price)) +
geom_line() +
labs(title = "Corn Prices Daily") + xlab("time") + ylab("prices")

## Price does not seem to fluctuate by day of the week
#Seasonal Plots
#Format back to date in aggregated month column
corn_m %>%
mutate(month = ym(month)) -> Corn_mon
#Convert to tibble
Corn_mon <- as_tibble(Corn_mon)
#convert data frame into time series tsibble object
Corn_mon %>% as_tsibble(index = month) -> Corn_mon_ts
#Format data for Corn seasonal plots
Corn_mon_ts %>%
mutate(Month = tsibble::yearmonth(month)) %>%
as_tsibble(index = Month) %>%
dplyr::select(Month,mon_avg) -> Corn_sea_ts
#Corn seasonal plot by month
autoplot(Corn_sea_ts, mon_avg) +
ylab("monthly Corn prices") +
xlab("")

#Different view Corn seasonal plot
Corn_sea_ts %>% gg_season(mon_avg, labels = "both") +
ylab("Monthly Corn prices")

#Corn seasonal subseries plot
Corn_sea_ts %>% gg_subseries(mon_avg) +
labs(y = "Corn prices", title = "Seasonal subseries plot: Corn prices")

Corn ARIMA models
##Stationary Test
adf.test(corn_ts, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: corn_ts
## Dickey-Fuller = -2.6343, Lag order = 7, p-value = 0.3092
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(corn_ts), alternative ="stationary")
## Warning in adf.test(diff(corn_ts), alternative = "stationary"): p-value smaller
## than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(corn_ts)
## Dickey-Fuller = -8.179, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Data is now stationary making the value of (d)=1
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(corn_ts),main='')

#q=4
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(corn_ts),main='')

#p=4. There are 4 partial auto correlation values
# ARIMA Custom(best fit)
corn_fit<- Arima(corn_ts, order=c(4,1,4))
corn_fit
## Series: corn_ts
## ARIMA(4,1,4)
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## 1.1854 -1.0570 0.8632 -0.0586 -0.8941 0.8123 -0.5885 -0.2947
## s.e. 0.1538 0.1969 0.1838 0.1383 0.1474 0.1808 0.1789 0.1434
##
## sigma^2 = 0.07194: log likelihood = -34.06
## AIC=86.12 AICc=86.63 BIC=121.12
#Check residuals
checkresiduals(corn_fit)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,1,4)
## Q* = 33.005, df = 16, p-value = 0.007378
##
## Model df: 8. Total lags used: 24
#Auto-fit Arima
auto_corn<- auto.arima(corn_ts)
auto_corn
## Series: corn_ts
## ARIMA(0,1,1)(0,0,1)[12]
##
## Coefficients:
## ma1 sma1
## 0.3229 -0.1350
## s.e. 0.0504 0.0571
##
## sigma^2 = 0.07494: log likelihood = -43.71
## AIC=93.41 AICc=93.48 BIC=105.08
checkresiduals(auto_corn)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(0,0,1)[12]
## Q* = 37.689, df = 22, p-value = 0.01985
##
## Model df: 2. Total lags used: 24
##Forecast Plots
##Forecast Custom(better model)
autoplot(forecast::forecast(corn_fit, h=12, level=c(80,95)))

##Forecast Auto (overfit)
autoplot(forecast::forecast(auto_corn, h=12, level=c(80,95)))

#ARIMA using more recent data
corn_r2 <- corn %>%
filter(date >= as.Date('2016-01-01') & date <= as.Date('2023-01-31'))
corn_r2 <- corn_r2 %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)
#Drop Duplicate rows
corn_r2 <- corn_r2[!duplicated(corn_r2),]
corn_tsr <- ts(corn_r2$mon_avg,start=c(2016),frequency=12)
##Stationary Test
adf.test(corn_tsr, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: corn_tsr
## Dickey-Fuller = -2.2964, Lag order = 4, p-value = 0.4543
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(corn_tsr), alternative ="stationary")
## Warning in adf.test(diff(corn_tsr), alternative = "stationary"): p-value smaller
## than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(corn_tsr)
## Dickey-Fuller = -4.7015, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#d=1
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(corn_tsr),main='')

#q=1
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(corn_tsr),main='')

#p=3. There are 3 partial auto correlation values
# ARIMA Custom(overfit)
corn_fitR<- Arima(corn_tsr, order=c(3,1,1))
corn_fitR
## Series: corn_tsr
## ARIMA(3,1,1)
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 0.3584 -0.2581 0.0789 0.1171
## s.e. 0.5370 0.2613 0.1792 0.5311
##
## sigma^2 = 0.07726: log likelihood = -9.75
## AIC=29.5 AICc=30.27 BIC=41.66
#Check residuals
checkresiduals(corn_fitR)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,1,1)
## Q* = 16.143, df = 13, p-value = 0.2415
##
## Model df: 4. Total lags used: 17
#Auto-fit Arima(overfit)
auto_cornR<- auto.arima(corn_tsr)
auto_cornR
## Series: corn_tsr
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## 0.4938
## s.e. 0.0957
##
## sigma^2 = 0.07547: log likelihood = -10.3
## AIC=24.59 AICc=24.74 BIC=29.46
checkresiduals(auto_cornR)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)
## Q* = 16.545, df = 16, p-value = 0.4156
##
## Model df: 1. Total lags used: 17
##Forecast Plot
autoplot(forecast::forecast(auto_cornR, h=12, level=c(80,95)))

##Forecast Custom
autoplot(forecast::forecast(corn_fitR, h=12, level=c(80,95)))

#Printing Predictions
corn_predictions <- forecast::forecast(corn_fitR,h=12)
print(corn_predictions$mean)
## Jan Feb Mar Apr May Jun Jul Aug
## 2023 6.833895 6.815075 6.791769 6.798035 6.804812 6.803784 6.802160
## 2024 6.802662
## Sep Oct Nov Dec
## 2023 6.802379 6.802795 6.802760 6.802657
## 2024
corn_predictions <- forecast::forecast(auto_cornR,h=12)
print(corn_predictions$mean)
## Jan Feb Mar Apr May Jun Jul Aug
## 2023 6.837366 6.837366 6.837366 6.837366 6.837366 6.837366 6.837366
## 2024 6.837366
## Sep Oct Nov Dec
## 2023 6.837366 6.837366 6.837366 6.837366
## 2024
GARCH Corn
Modeling
# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data
corn_garch <- ugarchfit(spec = garch_model, data = corn_ts)
print(corn_garch)
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 3.629372 0.029175 124.3990 0.00000
## omega 0.032509 0.005407 6.0122 0.00000
## alpha1 0.998999 0.047015 21.2486 0.00000
## beta1 0.000000 0.080373 0.0000 1.00000
## shape 99.992685 68.196361 1.4662 0.14258
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 3.629372 0.055652 65.2160 0.000000
## omega 0.032509 0.009629 3.3762 0.000735
## alpha1 0.998999 0.119584 8.3540 0.000000
## beta1 0.000000 0.118365 0.0000 1.000000
## shape 99.992685 15.275796 6.5458 0.000000
##
## LogLikelihood : -456.3449
##
## Information Criteria
## ------------------------------------
##
## Akaike 2.5489
## Bayes 2.6026
## Shibata 2.5485
## Hannan-Quinn 2.5702
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 261.4 0
## Lag[2*(p+q)+(p+q)-1][2] 357.4 0
## Lag[4*(p+q)+(p+q)-1][5] 580.3 0
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 13.09 0.0002968
## Lag[2*(p+q)+(p+q)-1][5] 14.17 0.0007417
## Lag[4*(p+q)+(p+q)-1][9] 16.46 0.0015718
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.8891 0.500 2.000 0.3457
## ARCH Lag[5] 1.3093 1.440 1.667 0.6439
## ARCH Lag[7] 2.7544 2.315 1.543 0.5611
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 47.7516
## Individual Statistics:
## mu 3.7398
## omega 0.2365
## alpha1 0.5115
## beta1 1.1196
## shape 29.9616
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 2.489 0.01328 **
## Negative Sign Bias 1.138 0.25570
## Positive Sign Bias 1.172 0.24180
## Joint Effect 7.104 0.06865 *
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 573.6 1.881e-109
## 2 30 631.5 1.871e-114
## 3 40 668.1 1.605e-115
## 4 50 707.3 1.679e-117
##
##
## Elapsed time : 0.1471701
#plot(corn_garch, which = 1)
coef(corn_garch)
## mu omega alpha1 beta1 shape
## 3.629372e+00 3.250919e-02 9.989994e-01 2.010076e-08 9.999268e+01
# Forecasting
horizon <- 3
corn_forecast_garch <- ugarchforecast(corn_garch, n.ahead = horizon)
corn_forecast_mean <- as.numeric(corn_forecast_garch@forecast$seriesFor)
corn_actual_values <- as.numeric(window(corn_ts, start = c(1993, 1)))
#plot(corn_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single",
# main = "GARCH Forecast for corn Prices", ylab = "Price", xlab = "Time") #%>%
#lines(corn_ts[(length(corn_ts)-horizon+1):length(corn_ts)], col = "blue")
#Based on signficance. Let's try auto_arimas
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(1,0), include.mean = TRUE), distribution.model = "std")
corn_garch2 <- ugarchfit(spec = garch_model, data = corn_ts)
corn_garch2
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 2.18098 0.141929 15.3666 0.000000
## ar1 0.99876 0.007648 130.5945 0.000000
## omega 0.00450 0.003260 1.3802 0.167538
## alpha1 0.23114 0.108458 2.1311 0.033078
## beta1 0.76786 0.108470 7.0790 0.000000
## shape 3.16313 0.534512 5.9178 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 2.18098 0.031853 68.4709 0.000000
## ar1 0.99876 0.008974 111.2930 0.000000
## omega 0.00450 0.004201 1.0712 0.284073
## alpha1 0.23114 0.130453 1.7718 0.076425
## beta1 0.76786 0.155690 4.9320 0.000001
## shape 3.16313 0.498626 6.3437 0.000000
##
## LogLikelihood : 23.50113
##
## Information Criteria
## ------------------------------------
##
## Akaike -0.096691
## Bayes -0.032189
## Shibata -0.097229
## Hannan-Quinn -0.071049
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 19.93 8.051e-06
## Lag[2*(p+q)+(p+q)-1][2] 19.96 0.000e+00
## Lag[4*(p+q)+(p+q)-1][5] 20.53 1.362e-08
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.1213 0.7276
## Lag[2*(p+q)+(p+q)-1][5] 0.4571 0.9641
## Lag[4*(p+q)+(p+q)-1][9] 1.6224 0.9450
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.01666 0.500 2.000 0.8973
## ARCH Lag[5] 0.73569 1.440 1.667 0.8126
## ARCH Lag[7] 1.40213 2.315 1.543 0.8408
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 0.9697
## Individual Statistics:
## mu 0.22975
## ar1 0.07387
## omega 0.26481
## alpha1 0.13692
## beta1 0.21882
## shape 0.18026
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.3529 0.7244
## Negative Sign Bias 1.2854 0.1995
## Positive Sign Bias 0.1674 0.8672
## Joint Effect 1.8435 0.6055
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 28.61 0.07241
## 2 30 37.12 0.14328
## 3 40 47.28 0.17033
## 4 50 58.72 0.16115
##
##
## Elapsed time : 0.130918
#Time Series based on volatility or Variance based on a standard Garch [1,1] model
corn_vol <-ts(corn_garch2@fit$sigma^2,start=c(1993),frequency=12)
#plot(corn_vol,xlab="",ylab="",main="corn_Volatility (GARCH[1,1])")
#Exponential GARCH
Egarch_model <- ugarchspec(variance.model = list(model = "eGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(1,0), include.mean = TRUE), distribution.model = "std")
corn_egarch2 <- ugarchfit(spec = garch_model, data = corn_ts)
corn_egarch2
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 2.18098 0.141929 15.3666 0.000000
## ar1 0.99876 0.007648 130.5945 0.000000
## omega 0.00450 0.003260 1.3802 0.167538
## alpha1 0.23114 0.108458 2.1311 0.033078
## beta1 0.76786 0.108470 7.0790 0.000000
## shape 3.16313 0.534512 5.9178 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 2.18098 0.031853 68.4709 0.000000
## ar1 0.99876 0.008974 111.2930 0.000000
## omega 0.00450 0.004201 1.0712 0.284073
## alpha1 0.23114 0.130453 1.7718 0.076425
## beta1 0.76786 0.155690 4.9320 0.000001
## shape 3.16313 0.498626 6.3437 0.000000
##
## LogLikelihood : 23.50113
##
## Information Criteria
## ------------------------------------
##
## Akaike -0.096691
## Bayes -0.032189
## Shibata -0.097229
## Hannan-Quinn -0.071049
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 19.93 8.051e-06
## Lag[2*(p+q)+(p+q)-1][2] 19.96 0.000e+00
## Lag[4*(p+q)+(p+q)-1][5] 20.53 1.362e-08
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.1213 0.7276
## Lag[2*(p+q)+(p+q)-1][5] 0.4571 0.9641
## Lag[4*(p+q)+(p+q)-1][9] 1.6224 0.9450
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.01666 0.500 2.000 0.8973
## ARCH Lag[5] 0.73569 1.440 1.667 0.8126
## ARCH Lag[7] 1.40213 2.315 1.543 0.8408
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 0.9697
## Individual Statistics:
## mu 0.22975
## ar1 0.07387
## omega 0.26481
## alpha1 0.13692
## beta1 0.21882
## shape 0.18026
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.3529 0.7244
## Negative Sign Bias 1.2854 0.1995
## Positive Sign Bias 0.1674 0.8672
## Joint Effect 1.8435 0.6055
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 28.61 0.07241
## 2 30 37.12 0.14328
## 3 40 47.28 0.17033
## 4 50 58.72 0.16115
##
##
## Elapsed time : 0.1281919
## mu ar1 omega alpha1 beta1 shape
## 2.180975972 0.998758590 0.004499884 0.231137911 0.767862088 3.163133182
#Time Series based on volatility or Variance based on a standard Garch [1,1] model
ecorn_vol <-ts(corn_egarch2@fit$sigma^2,start=c(1993),frequency=12)
#plot(ecorn_vol,xlab="",ylab="",main="corn_Volatility (eGARCH[1,1])")
cor(corn_vol, ecorn_vol)
## [1] 1
#ts.plot(corn_vol,ecorn_vol,col=c("green","red"),xlab="")
#legend("topright",legend=c("Standard","Exponential"),col=c("green","red"),lty=c(1,1))
#No difference shown
#GARCH 3
names(corn_garch2@model)
## [1] "modelinc" "modeldesc" "modeldata" "pars" "start.pars"
## [6] "fixed.pars" "maxOrder" "pos.matrix" "fmodel" "pidx"
## [11] "n.start"
## [1] "hessian" "cvar" "var" "sigma"
## [5] "condH" "z" "LLH" "log.likelihoods"
## [9] "residuals" "coef" "robust.cvar" "A"
## [13] "B" "scores" "se.coef" "tval"
## [17] "matcoef" "robust.se.coef" "robust.tval" "robust.matcoef"
## [21] "fitted.values" "convergence" "kappa" "persistence"
## [25] "timer" "ipars" "solver"
#Variance
corn_garch_var <- corn_garch2@fit$var
#Residuals
corn_garch_res <- (corn_garch2@fit$residuals)^2
#Plotting residuals and conditional variances
plot(corn_garch_res, type = "l")
lines(corn_garch_var, col = "green")

corn_forecast_garch2 <- ugarchforecast(corn_garch2, n.ahead = 12)
corn_forecast_garch2
##
## *------------------------------------*
## * GARCH Model Forecast *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
##
## 0-roll forecast [T0=Feb 2023]:
## Series Sigma
## T+1 6.774 0.3252
## T+2 6.768 0.3319
## T+3 6.763 0.3385
## T+4 6.757 0.3449
## T+5 6.751 0.3512
## T+6 6.745 0.3573
## T+7 6.740 0.3634
## T+8 6.734 0.3694
## T+9 6.728 0.3752
## T+10 6.723 0.3810
## T+11 6.717 0.3867
## T+12 6.712 0.3923
ug_corn <- corn_forecast_garch2@forecast$sigmaFor
plot(ug_corn, type = "l")

corn_var_t <- c(tail(corn_garch_var,20),rep(NA,10)) # gets the last 20 observations
corn_res_t <- c(tail(corn_garch_res,20),rep(NA,10)) # gets the last 20 observations
corn_f <- c(rep(NA,20),(ug_corn)^2)
plot(corn_res_t, type = "l") #Residuals
lines(corn_f, col = "orange") # Predictions
lines(corn_var_t, col = "green") #Conditional Forecast
legend("topright",
legend = c("Residuals", "Predictions", "Conditional Forecast"),
col = c("black", "orange", "green"),
lty = c(1, 1, 1),
cex = 0.8)

#Plot Predictions
#plot(corn_forecast_garch2, main = "Forecasted corn Prices (GARCH(1,1))")
#legend("bottomright", legend = c("Mean", "Lower 95% CI", "Upper 95% CI"), col = c("black", "blue", "red"), lty = 1)
corn_mean_forecast <- as.numeric(corn_forecast_garch2@forecast$seriesFor)
Corn Model
Comparison
#ARIMA Models
forecast::accuracy(corn_fit)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01712893 0.2648581 0.1742406 0.2067825 4.61023 0.2193029
## ACF1
## Training set -0.002718111
forecast::accuracy(auto_corn)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01097705 0.2726103 0.1791116 0.1083757 4.71269 0.2254337
## ACF1
## Training set 0.007004455
forecast::accuracy(corn_fitR)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02756006 0.269654 0.1843457 0.4523231 3.98982 0.2603172
## ACF1
## Training set -0.002711066
forecast::accuracy(auto_cornR)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02538322 0.2714618 0.1854949 0.4239981 4.019807 0.2619401
## ACF1
## Training set -0.02101573
## [1] 86.11677
## [1] 121.1167
## [1] 29.50332
## [1] 41.6574
#GARCH Model
corn_actual_values <- head(corn_actual_values, length(corn_forecast_mean))
mae <- mean(abs(corn_mean_forecast - corn_actual_values))
mse <- mean((corn_mean_forecast - corn_actual_values)^2)
rmse <- sqrt(mse)
# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE: 4.5589735838937
cat(paste("MSE: ", mse, "\n"))
## MSE: 20.7869575052084
cat(paste("RMSE: ", rmse, "\n"))
## RMSE: 4.55927159809639
forecast_mean_corn2 <- as.numeric(corn_forecast_garch2@forecast$seriesFor)
actual_values_corn2 <- as.numeric(window(corn_ts, start = c(1993, 1)))
actual_values_corn2 <- head(actual_values_corn2, length(forecast_mean_corn2))
corn_mae <- mean(abs(forecast_mean_corn2 - actual_values_corn2))
corn_mse <- mean((forecast_mean_corn2 - actual_values_corn2)^2)
corn_rmse <- sqrt(corn_mse)
# Print the results
cat(paste("MAE: ", corn_mae, "\n"))
## MAE: 4.3509781922319
cat(paste("MSE: ", corn_mse, "\n"))
## MSE: 18.9941872437891
cat(paste("RMSE: ", corn_rmse, "\n"))
## RMSE: 4.35823212366999
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 3.629372 0.029175 124.3990 0.00000
## omega 0.032509 0.005407 6.0122 0.00000
## alpha1 0.998999 0.047015 21.2486 0.00000
## beta1 0.000000 0.080373 0.0000 1.00000
## shape 99.992685 68.196361 1.4662 0.14258
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 3.629372 0.055652 65.2160 0.000000
## omega 0.032509 0.009629 3.3762 0.000735
## alpha1 0.998999 0.119584 8.3540 0.000000
## beta1 0.000000 0.118365 0.0000 1.000000
## shape 99.992685 15.275796 6.5458 0.000000
##
## LogLikelihood : -456.3449
##
## Information Criteria
## ------------------------------------
##
## Akaike 2.5489
## Bayes 2.6026
## Shibata 2.5485
## Hannan-Quinn 2.5702
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 261.4 0
## Lag[2*(p+q)+(p+q)-1][2] 357.4 0
## Lag[4*(p+q)+(p+q)-1][5] 580.3 0
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 13.09 0.0002968
## Lag[2*(p+q)+(p+q)-1][5] 14.17 0.0007417
## Lag[4*(p+q)+(p+q)-1][9] 16.46 0.0015718
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.8891 0.500 2.000 0.3457
## ARCH Lag[5] 1.3093 1.440 1.667 0.6439
## ARCH Lag[7] 2.7544 2.315 1.543 0.5611
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 47.7516
## Individual Statistics:
## mu 3.7398
## omega 0.2365
## alpha1 0.5115
## beta1 1.1196
## shape 29.9616
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 2.489 0.01328 **
## Negative Sign Bias 1.138 0.25570
## Positive Sign Bias 1.172 0.24180
## Joint Effect 7.104 0.06865 *
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 573.6 1.881e-109
## 2 30 631.5 1.871e-114
## 3 40 668.1 1.605e-115
## 4 50 707.3 1.679e-117
##
##
## Elapsed time : 0.1471701
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 2.18098 0.141929 15.3666 0.000000
## ar1 0.99876 0.007648 130.5945 0.000000
## omega 0.00450 0.003260 1.3802 0.167538
## alpha1 0.23114 0.108458 2.1311 0.033078
## beta1 0.76786 0.108470 7.0790 0.000000
## shape 3.16313 0.534512 5.9178 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 2.18098 0.031853 68.4709 0.000000
## ar1 0.99876 0.008974 111.2930 0.000000
## omega 0.00450 0.004201 1.0712 0.284073
## alpha1 0.23114 0.130453 1.7718 0.076425
## beta1 0.76786 0.155690 4.9320 0.000001
## shape 3.16313 0.498626 6.3437 0.000000
##
## LogLikelihood : 23.50113
##
## Information Criteria
## ------------------------------------
##
## Akaike -0.096691
## Bayes -0.032189
## Shibata -0.097229
## Hannan-Quinn -0.071049
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 19.93 8.051e-06
## Lag[2*(p+q)+(p+q)-1][2] 19.96 0.000e+00
## Lag[4*(p+q)+(p+q)-1][5] 20.53 1.362e-08
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.1213 0.7276
## Lag[2*(p+q)+(p+q)-1][5] 0.4571 0.9641
## Lag[4*(p+q)+(p+q)-1][9] 1.6224 0.9450
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.01666 0.500 2.000 0.8973
## ARCH Lag[5] 0.73569 1.440 1.667 0.8126
## ARCH Lag[7] 1.40213 2.315 1.543 0.8408
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 0.9697
## Individual Statistics:
## mu 0.22975
## ar1 0.07387
## omega 0.26481
## alpha1 0.13692
## beta1 0.21882
## shape 0.18026
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.3529 0.7244
## Negative Sign Bias 1.2854 0.1995
## Positive Sign Bias 0.1674 0.8672
## Joint Effect 1.8435 0.6055
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 28.61 0.07241
## 2 30 37.12 0.14328
## 3 40 47.28 0.17033
## 4 50 58.72 0.16115
##
##
## Elapsed time : 0.130918
Exploring Cotton
# Looking at the price of Cotton over the last 30 years
ggplot(cotton, aes(date, price)) +
geom_line() +
labs(title = "Revenue by Day")

#Summary of Cotton
summary(cotton)
## date price month
## Min. :1993-01-04 Min. :0.2852 Length:7586
## 1st Qu.:2000-07-27 1st Qu.:0.5757 Class :character
## Median :2008-03-10 Median :0.6839 Mode :character
## Mean :2008-02-21 Mean :0.7143
## 3rd Qu.:2015-09-16 3rd Qu.:0.8070
## Max. :2023-02-17 Max. :2.1414
#Cotton monthly average
cotton_m <- cotton %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)
#Drop Duplicate rows Cotton
cotton_m <- cotton_m[!duplicated(cotton_m),]
#Creating Time series data Cotton
cotton_ts <- ts(cotton_m$mon_avg,start=c(1993),frequency=12)
#Plotting Cotton
chartSeries(cotton_ts)

## Huge price hike in 2012
#Looking at trends
autoplot(decompose((cotton_ts)),main="")

#Looking for daily or weekly trends Cotton
cotton_r <- cotton %>%
filter(date >= as.Date('2022-1-01') & date <= as.Date('2022-12-31'))
ggplot(cotton_r, aes(date, price)) +
geom_line() +
labs(title = "Cotton Prices Daily") + xlab("time") + ylab("prices")

## Price does not seem to fluctuate by day of the week
#Seasonal Plots
#Format back to date in aggregated month column
cotton_m %>%
mutate(month = ym(month)) -> Cotton_mon
#Convert to tibble
Cotton_mon <- as_tibble(Cotton_mon)
#convert data frame into time series tsibble object
Cotton_mon %>% as_tsibble(index = month) -> Cotton_mon_ts
#Format data for Cotton seasonal plots
Cotton_mon_ts %>%
mutate(Month = tsibble::yearmonth(month)) %>%
as_tsibble(index = Month) %>%
dplyr::select(Month,mon_avg) -> Cotton_sea_ts
#Cotton seasonal plot by month
autoplot(Cotton_sea_ts, mon_avg) +
ylab("monthly Cotton prices") +
xlab("")

#Different view Cotton seasonal plot
Cotton_sea_ts %>% gg_season(mon_avg, labels = "both") +
ylab("Monthly Cotton prices")

#Cotton seasonal subseries plot
Cotton_sea_ts %>% gg_subseries(mon_avg) +
labs(y = "Cotton prices", title = "Seasonal subseries plot: Cotton prices")

Cotton ARIMA
models
##Stationary Test
adf.test(cotton_ts, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: cotton_ts
## Dickey-Fuller = -3.6639, Lag order = 7, p-value = 0.02708
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(cotton_ts), alternative ="stationary")
## Warning in adf.test(diff(cotton_ts), alternative = "stationary"): p-value
## smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(cotton_ts)
## Dickey-Fuller = -8.0778, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#Already a low p value d = 0
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(cotton_ts),main='')

#q=6
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(cotton_ts),main='')

#p=4. There are 4 partial auto correlation values
# ARIMA Custom(best fit)
cotton_fit<- Arima(cotton_ts, order=c(4,0,6))
cotton_fit
## Series: cotton_ts
## ARIMA(4,0,6) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4 ma5
## -0.1497 0.0744 0.6415 0.1818 1.3282 1.3555 0.7720 0.4236 0.1456
## s.e. 0.3350 0.0994 0.1921 0.2557 0.3312 0.3997 0.4156 0.2321 0.1993
## ma6 mean
## -0.0667 0.716
## s.e. 0.1089 0.058
##
## sigma^2 = 0.003459: log likelihood = 516.02
## AIC=-1008.04 AICc=-1007.15 BIC=-961.34
#Check residuals
checkresiduals(cotton_fit)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,0,6) with non-zero mean
## Q* = 21.874, df = 14, p-value = 0.08124
##
## Model df: 10. Total lags used: 24
#Auto-fit Arima
auto_cotton<- auto.arima(cotton_ts)
auto_cotton
## Series: cotton_ts
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## 0.5245 -0.3203
## s.e. 0.1458 0.1594
##
## sigma^2 = 0.003706: log likelihood = 499.12
## AIC=-992.25 AICc=-992.18 BIC=-980.58
checkresiduals(auto_cotton)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1)
## Q* = 43.336, df = 22, p-value = 0.00429
##
## Model df: 2. Total lags used: 24
##Forecast Plots
##Forecast Custom
autoplot(forecast::forecast(cotton_fit, h=12, level=c(80,95))) #(better model)

##Forecast Auto
autoplot(forecast::forecast(auto_cotton, h=12, level=c(80,95)))#(overfit)

#ARIMA using more recent data
cotton_r2 <- cotton %>%
filter(date >= as.Date('2016-01-01') & date <= as.Date('2023-01-31'))
cotton_r2 <- cotton_r2 %>% group_by(month) %>% mutate(mon_avg = mean(price))%>%
select(month, mon_avg)
#Drop Duplicate rows
cotton_r2 <- cotton_r2[!duplicated(cotton_r2),]
cotton_tsr <- ts(cotton_r2$mon_avg,start=c(2016),frequency=12)
##Stationary Test
adf.test(cotton_tsr, alternative = "stationary")
##
## Augmented Dickey-Fuller Test
##
## data: cotton_tsr
## Dickey-Fuller = -2.5097, Lag order = 4, p-value = 0.3667
## alternative hypothesis: stationary
## After first-order differencing
adf.test(diff(cotton_tsr), alternative ="stationary")
##
## Augmented Dickey-Fuller Test
##
## data: diff(cotton_tsr)
## Dickey-Fuller = -3.7655, Lag order = 4, p-value = 0.02446
## alternative hypothesis: stationary
#This decreases our p-value and creates significance
#d=1
#Custom Arima Model
#Correlation Plot and Tuning selection
#ACF (q)
acf(diff(cotton_tsr),main='')

#q=1
#Looking at impact of price prior periods have on consecutive time periods
#PACF (p)
pacf(diff(cotton_tsr),main='')

#p=1. There are 1 partial auto correlation values
# ARIMA Custom(overfit)
cotton_fitR<- Arima(cotton_tsr, order=c(1,1,1))
cotton_fitR
## Series: cotton_tsr
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## -0.6098 0.8354
## s.e. 0.1782 0.1261
##
## sigma^2 = 0.003624: log likelihood = 117.75
## AIC=-229.5 AICc=-229.2 BIC=-222.21
#Check residuals
checkresiduals(cotton_fitR)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1)
## Q* = 16.158, df = 15, p-value = 0.3717
##
## Model df: 2. Total lags used: 17
#Auto-fit Arima(overfit)
auto_cottonR<- auto.arima(cotton_tsr)
auto_cottonR
## Series: cotton_tsr
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## -0.6098 0.8354
## s.e. 0.1782 0.1261
##
## sigma^2 = 0.003624: log likelihood = 117.75
## AIC=-229.5 AICc=-229.2 BIC=-222.21
checkresiduals(auto_cottonR)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1)
## Q* = 16.158, df = 15, p-value = 0.3717
##
## Model df: 2. Total lags used: 17
##Forecast Plot
autoplot(forecast::forecast(auto_cottonR, h=12, level=c(80,95)))

##Forecast Custom
autoplot(forecast::forecast(cotton_fitR, h=12, level=c(80,95)))

#Printing Predictions
cotton_predictions <- forecast::forecast(cotton_fitR,h=12)
print(cotton_predictions$mean)
## Jan Feb Mar Apr May Jun Jul
## 2023 0.8309998 0.8398149 0.8344391 0.8377175 0.8357182 0.8369374
## 2024 0.8364993
## Aug Sep Oct Nov Dec
## 2023 0.8361939 0.8366473 0.8363708 0.8365394 0.8364366
## 2024
cotton_predictions <- forecast::forecast(auto_cottonR,h=12)
print(cotton_predictions$mean)
## Jan Feb Mar Apr May Jun Jul
## 2023 0.8309998 0.8398149 0.8344391 0.8377175 0.8357182 0.8369374
## 2024 0.8364993
## Aug Sep Oct Nov Dec
## 2023 0.8361939 0.8366473 0.8363708 0.8365394 0.8364366
## 2024
GARCH Cotton
Modeling
# Model Creation
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(0,0), include.mean = TRUE), distribution.model = "std")
# Fitting Model to Data
cotton_garch <- ugarchfit(spec = garch_model, data = cotton_ts)
print(cotton_garch)
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.703309 0.009548 73.6627 0.000000
## omega 0.001195 0.000339 3.5213 0.000429
## alpha1 0.999000 0.108225 9.2308 0.000000
## beta1 0.000000 0.052273 0.0000 1.000000
## shape 99.999979 65.555380 1.5254 0.127153
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.703309 0.042439 16.5722 0.00000
## omega 0.001195 0.000804 1.4866 0.13712
## alpha1 0.999000 0.106978 9.3384 0.00000
## beta1 0.000000 0.027784 0.0000 1.00000
## shape 99.999979 12.121320 8.2499 0.00000
##
## LogLikelihood : 256.0808
##
## Information Criteria
## ------------------------------------
##
## Akaike -1.3872
## Bayes -1.3334
## Shibata -1.3876
## Hannan-Quinn -1.3658
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 268.0 0
## Lag[2*(p+q)+(p+q)-1][2] 373.8 0
## Lag[4*(p+q)+(p+q)-1][5] 629.9 0
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 10.13 0.001457
## Lag[2*(p+q)+(p+q)-1][5] 12.12 0.002596
## Lag[4*(p+q)+(p+q)-1][9] 14.30 0.005270
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.0149 0.500 2.000 0.9028
## ARCH Lag[5] 4.2406 1.440 1.667 0.1534
## ARCH Lag[7] 4.9004 2.315 1.543 0.2348
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 34.1067
## Individual Statistics:
## mu 0.41257
## omega 0.05232
## alpha1 0.21244
## beta1 0.27042
## shape 21.91731
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.04528 0.9639
## Negative Sign Bias 0.52471 0.6001
## Positive Sign Bias 0.04531 0.9639
## Joint Effect 0.46610 0.9263
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 267.1 1.066e-45
## 2 30 277.4 2.253e-42
## 3 40 288.6 7.807e-40
## 4 50 289.7 7.205e-36
##
##
## Elapsed time : 0.1835911
#plot(cotton_garch, which = 1)
coef(cotton_garch)
## mu omega alpha1 beta1 shape
## 7.033091e-01 1.195185e-03 9.989999e-01 1.157115e-08 9.999998e+01
# Forecasting
horizon <- 3
cotton_forecast_garch <- ugarchforecast(cotton_garch, n.ahead = horizon)
forecast_mean_cotton <- as.numeric(cotton_forecast_garch@forecast$seriesFor)
actual_values_cotton <- as.numeric(window(cotton_ts, start = c(1993, 1)))
#plot(cotton_forecast_garch, n.plot = horizon, n.col = 1, plot.type = "single",
# main = "GARCH Forecast for cotton Prices", ylab = "Price", xlab = "Time") #%>%
#lines(cotton_ts[(length(cotton_ts)-horizon+1):length(cotton_ts)], col = "blue")
#Based on signficance. Let's try auto_arimas
garch_model <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(1,0), include.mean = TRUE), distribution.model = "std")
cotton_garch2 <- ugarchfit(spec = garch_model, data = cotton_ts)
cotton_garch2
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.620592 0.031862 19.4772 0.000000
## ar1 0.974716 0.015198 64.1350 0.000000
## omega 0.000651 0.000196 3.3136 0.000921
## alpha1 0.645926 0.196116 3.2936 0.000989
## beta1 0.251608 0.117291 2.1452 0.031940
## shape 4.978376 1.417768 3.5114 0.000446
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.620592 0.023988 25.8709 0.000000
## ar1 0.974716 0.020674 47.1470 0.000000
## omega 0.000651 0.000244 2.6709 0.007565
## alpha1 0.645926 0.222294 2.9057 0.003664
## beta1 0.251608 0.171291 1.4689 0.141862
## shape 4.978376 1.474353 3.3767 0.000734
##
## LogLikelihood : 619.2135
##
## Information Criteria
## ------------------------------------
##
## Akaike -3.3879
## Bayes -3.3234
## Shibata -3.3885
## Hannan-Quinn -3.3623
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 13.12 2.919e-04
## Lag[2*(p+q)+(p+q)-1][2] 13.87 3.741e-14
## Lag[4*(p+q)+(p+q)-1][5] 16.25 1.551e-06
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.3671 0.5446
## Lag[2*(p+q)+(p+q)-1][5] 1.1923 0.8148
## Lag[4*(p+q)+(p+q)-1][9] 2.5292 0.8334
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 1.173 0.500 2.000 0.2788
## ARCH Lag[5] 1.222 1.440 1.667 0.6684
## ARCH Lag[7] 2.415 2.315 1.543 0.6300
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 1.5523
## Individual Statistics:
## mu 0.99815
## ar1 0.09716
## omega 0.20910
## alpha1 0.13640
## beta1 0.24576
## shape 0.20950
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.6637 0.5073
## Negative Sign Bias 0.1623 0.8712
## Positive Sign Bias 0.4003 0.6892
## Joint Effect 1.5536 0.6700
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 17.78 0.5372
## 2 30 22.70 0.7903
## 3 40 29.60 0.8614
## 4 50 36.90 0.8983
##
##
## Elapsed time : 0.1476181
#Time Series based on volatility or Variance based on a standard Garch [1,1] model
cotton_vol <-ts(cotton_garch2@fit$sigma^2,start=c(1993),frequency=12)
#plot(cotton_vol,xlab="",ylab="",main="cotton_Volatility (GARCH[1,1])")
#Exponential GARCH (does not work quite as well)
Egarch_model <- ugarchspec(variance.model = list(model = "eGARCH",
garchOrder = c(1,1)),
mean.model = list(armaOrder =
c(1,0), include.mean = TRUE), distribution.model = "std")
cotton_egarch2 <- ugarchfit(spec = garch_model, data = cotton_ts)
cotton_egarch2
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.620592 0.031862 19.4772 0.000000
## ar1 0.974716 0.015198 64.1350 0.000000
## omega 0.000651 0.000196 3.3136 0.000921
## alpha1 0.645926 0.196116 3.2936 0.000989
## beta1 0.251608 0.117291 2.1452 0.031940
## shape 4.978376 1.417768 3.5114 0.000446
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.620592 0.023988 25.8709 0.000000
## ar1 0.974716 0.020674 47.1470 0.000000
## omega 0.000651 0.000244 2.6709 0.007565
## alpha1 0.645926 0.222294 2.9057 0.003664
## beta1 0.251608 0.171291 1.4689 0.141862
## shape 4.978376 1.474353 3.3767 0.000734
##
## LogLikelihood : 619.2135
##
## Information Criteria
## ------------------------------------
##
## Akaike -3.3879
## Bayes -3.3234
## Shibata -3.3885
## Hannan-Quinn -3.3623
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 13.12 2.919e-04
## Lag[2*(p+q)+(p+q)-1][2] 13.87 3.741e-14
## Lag[4*(p+q)+(p+q)-1][5] 16.25 1.551e-06
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.3671 0.5446
## Lag[2*(p+q)+(p+q)-1][5] 1.1923 0.8148
## Lag[4*(p+q)+(p+q)-1][9] 2.5292 0.8334
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 1.173 0.500 2.000 0.2788
## ARCH Lag[5] 1.222 1.440 1.667 0.6684
## ARCH Lag[7] 2.415 2.315 1.543 0.6300
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 1.5523
## Individual Statistics:
## mu 0.99815
## ar1 0.09716
## omega 0.20910
## alpha1 0.13640
## beta1 0.24576
## shape 0.20950
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.6637 0.5073
## Negative Sign Bias 0.1623 0.8712
## Positive Sign Bias 0.4003 0.6892
## Joint Effect 1.5536 0.6700
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 17.78 0.5372
## 2 30 22.70 0.7903
## 3 40 29.60 0.8614
## 4 50 36.90 0.8983
##
##
## Elapsed time : 0.1544681
## mu ar1 omega alpha1 beta1 shape
## 0.6205920349 0.9747158449 0.0006507369 0.6459258710 0.2516076142 4.9783760600
#Time Series based on volatility or Variance based on a standard Garch [1,1] model
ecotton_vol <-ts(cotton_egarch2@fit$sigma^2,start=c(1993),frequency=12)
#plot(ecotton_vol,xlab="",ylab="",main="cotton_Volatility (eGARCH[1,1])")
cor(cotton_vol, ecotton_vol)
## [1] 1
ts.plot(cotton_vol,ecotton_vol,col=c("green","red"),xlab="")
legend("topright",legend=c("Standard","Exponential"),col=c("green","red"),lty=c(1,1))

#No difference shown
#GARCH 3
names(cotton_garch2@model)
## [1] "modelinc" "modeldesc" "modeldata" "pars" "start.pars"
## [6] "fixed.pars" "maxOrder" "pos.matrix" "fmodel" "pidx"
## [11] "n.start"
## [1] "hessian" "cvar" "var" "sigma"
## [5] "condH" "z" "LLH" "log.likelihoods"
## [9] "residuals" "coef" "robust.cvar" "A"
## [13] "B" "scores" "se.coef" "tval"
## [17] "matcoef" "robust.se.coef" "robust.tval" "robust.matcoef"
## [21] "fitted.values" "convergence" "kappa" "persistence"
## [25] "timer" "ipars" "solver"
#Variance
cotton_garch_var <- cotton_garch2@fit$var
#Residuals
cotton_garch_res <- (cotton_garch2@fit$residuals)^2
#Plotting residuals and conditional variances
plot(cotton_garch_res, type = "l")
lines(cotton_garch_var, col = "green")

cotton_forecast_garch2 <- ugarchforecast(cotton_garch2, n.ahead = 12)
cotton_forecast_garch2
##
## *------------------------------------*
## * GARCH Model Forecast *
## *------------------------------------*
## Model: sGARCH
## Horizon: 12
## Roll Steps: 0
## Out of Sample: 0
##
## 0-roll forecast [T0=Feb 2023]:
## Series Sigma
## T+1 0.8382 0.03149
## T+2 0.8327 0.03925
## T+3 0.8273 0.04510
## T+4 0.8221 0.04976
## T+5 0.8170 0.05360
## T+6 0.8120 0.05683
## T+7 0.8072 0.05958
## T+8 0.8025 0.06194
## T+9 0.7979 0.06398
## T+10 0.7934 0.06577
## T+11 0.7890 0.06733
## T+12 0.7848 0.06870
ug_cotton <- cotton_forecast_garch2@forecast$sigmaFor
plot(ug_cotton, type = "l")

cotton_var_t <- c(tail(cotton_garch_var,20),rep(NA,10)) # gets the last 20 observations
cotton_res_t <- c(tail(cotton_garch_res,20),rep(NA,10)) # gets the last 20 observations
cotton_f <- c(rep(NA,20),(ug_cotton)^2)
plot(cotton_res_t, type = "l") #Residuals
lines(cotton_f, col = "orange") # Predictions
lines(cotton_var_t, col = "green") #Conditional Forecast
legend("topright",
legend = c("Residuals", "Predictions", "Conditional Forecast"),
col = c("black", "orange", "green"),
lty = c(1, 1, 1),
cex = 0.8)

#Plot Predictions
#plot(cotton_forecast_garch2, main = "Forecasted cotton Prices (GARCH(1,1))")
cotton_mean_forecast <- as.numeric(cotton_forecast_garch2@forecast$seriesFor)
# Plot the mean forecasted values with the two confidence intervals
#plot(cotton_forecast_garch2, main = "Forecasted cotton Prices (GARCH(1,1))")
#ARIMA Models
forecast::accuracy(cotton_fit)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0002759028 0.05790952 0.03712278 -0.5037232 5.077883 0.2186312
## ACF1
## Training set 0.001309187
forecast::accuracy(auto_cotton)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0004544867 0.0606267 0.03717778 -0.07564934 5.035075 0.2189551
## ACF1
## Training set -0.002240768
forecast::accuracy(cotton_fitR)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.002263093 0.05912699 0.03867624 0.1458866 4.549521 0.2245412
## ACF1
## Training set -0.0242505
## [1] -1008.04
## [1] -961.3404
## [1] -229.5022
## [1] -222.2097
#GARCH Model
actual_values_cotton <- head(actual_values_cotton, length(cotton_mean_forecast))
mae <- mean(abs(forecast_mean_cotton - actual_values_cotton))
mse <- mean((forecast_mean_cotton - actual_values_cotton)^2)
rmse <- sqrt(mse)
# Print the results
cat(paste("MAE: ", mae, "\n"))
## MAE: 0.103256141089996
cat(paste("MSE: ", mse, "\n"))
## MSE: 0.0112779322589007
cat(paste("RMSE: ", rmse, "\n"))
## RMSE: 0.106197609478277
#GARCH Model 2
forecast_mean_cotton2 <- as.numeric(cotton_forecast_garch2@forecast$seriesFor)
actual_values_cotton2 <- as.numeric(window(cotton_ts, start = c(1993, 1)))
actual_values_cotton2 <- head(actual_values_cotton2, length(forecast_mean_cotton2))
cotton_mae <- mean(abs(forecast_mean_cotton2 - actual_values_cotton2))
cotton_mse <- mean((forecast_mean_cotton2 - actual_values_cotton2)^2)
cotton_rmse <- sqrt(cotton_mse)
# Print the results
cat(paste("MAE: ", cotton_mae, "\n"))
## MAE: 0.210298087834802
cat(paste("MSE: ", cotton_mse, "\n"))
## MSE: 0.0448996204296999
cat(paste("RMSE: ", cotton_rmse, "\n"))
## RMSE: 0.211895305350779
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.703309 0.009548 73.6627 0.000000
## omega 0.001195 0.000339 3.5213 0.000429
## alpha1 0.999000 0.108225 9.2308 0.000000
## beta1 0.000000 0.052273 0.0000 1.000000
## shape 99.999979 65.555380 1.5254 0.127153
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.703309 0.042439 16.5722 0.00000
## omega 0.001195 0.000804 1.4866 0.13712
## alpha1 0.999000 0.106978 9.3384 0.00000
## beta1 0.000000 0.027784 0.0000 1.00000
## shape 99.999979 12.121320 8.2499 0.00000
##
## LogLikelihood : 256.0808
##
## Information Criteria
## ------------------------------------
##
## Akaike -1.3872
## Bayes -1.3334
## Shibata -1.3876
## Hannan-Quinn -1.3658
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 268.0 0
## Lag[2*(p+q)+(p+q)-1][2] 373.8 0
## Lag[4*(p+q)+(p+q)-1][5] 629.9 0
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 10.13 0.001457
## Lag[2*(p+q)+(p+q)-1][5] 12.12 0.002596
## Lag[4*(p+q)+(p+q)-1][9] 14.30 0.005270
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.0149 0.500 2.000 0.9028
## ARCH Lag[5] 4.2406 1.440 1.667 0.1534
## ARCH Lag[7] 4.9004 2.315 1.543 0.2348
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 34.1067
## Individual Statistics:
## mu 0.41257
## omega 0.05232
## alpha1 0.21244
## beta1 0.27042
## shape 21.91731
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.04528 0.9639
## Negative Sign Bias 0.52471 0.6001
## Positive Sign Bias 0.04531 0.9639
## Joint Effect 0.46610 0.9263
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 267.1 1.066e-45
## 2 30 277.4 2.253e-42
## 3 40 288.6 7.807e-40
## 4 50 289.7 7.205e-36
##
##
## Elapsed time : 0.1835911
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(1,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.620592 0.031862 19.4772 0.000000
## ar1 0.974716 0.015198 64.1350 0.000000
## omega 0.000651 0.000196 3.3136 0.000921
## alpha1 0.645926 0.196116 3.2936 0.000989
## beta1 0.251608 0.117291 2.1452 0.031940
## shape 4.978376 1.417768 3.5114 0.000446
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.620592 0.023988 25.8709 0.000000
## ar1 0.974716 0.020674 47.1470 0.000000
## omega 0.000651 0.000244 2.6709 0.007565
## alpha1 0.645926 0.222294 2.9057 0.003664
## beta1 0.251608 0.171291 1.4689 0.141862
## shape 4.978376 1.474353 3.3767 0.000734
##
## LogLikelihood : 619.2135
##
## Information Criteria
## ------------------------------------
##
## Akaike -3.3879
## Bayes -3.3234
## Shibata -3.3885
## Hannan-Quinn -3.3623
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 13.12 2.919e-04
## Lag[2*(p+q)+(p+q)-1][2] 13.87 3.741e-14
## Lag[4*(p+q)+(p+q)-1][5] 16.25 1.551e-06
## d.o.f=1
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.3671 0.5446
## Lag[2*(p+q)+(p+q)-1][5] 1.1923 0.8148
## Lag[4*(p+q)+(p+q)-1][9] 2.5292 0.8334
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 1.173 0.500 2.000 0.2788
## ARCH Lag[5] 1.222 1.440 1.667 0.6684
## ARCH Lag[7] 2.415 2.315 1.543 0.6300
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 1.5523
## Individual Statistics:
## mu 0.99815
## ar1 0.09716
## omega 0.20910
## alpha1 0.13640
## beta1 0.24576
## shape 0.20950
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.6637 0.5073
## Negative Sign Bias 0.1623 0.8712
## Positive Sign Bias 0.4003 0.6892
## Joint Effect 1.5536 0.6700
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 17.78 0.5372
## 2 30 22.70 0.7903
## 3 40 29.60 0.8614
## 4 50 36.90 0.8983
##
##
## Elapsed time : 0.1476181
EDA Results
# There seems to be major economic events that impact the prices of every commodity. These include sharp increases and decreases in prices before a leveling out. There does not seem to be significant price changes based on weekday. Commodity pricing did see some sharp changes during the years 2008 to 2012, which deserves additional exploration.
#In general, the 90's showed steady prices of commodities and even were on a bit of downward trend. Things began to shift around 2002. Between 2002 most commodities saw a gradual incline followed by a sharp spike in 2008 again in 2011 and one more time around 2020 and 2022. Some of the common factors of price increase include the declining value of the US dollar, rising energy prices and crop yields.
#Demand for products has been increasing as the world population increases. Developing countries and markets tend to use more commodities which increases price on global markets. After the 2008 recession, demand increased along with continued currency depreciation leading to price spikes.
#The current economic situation is unique and has led to another commodity price surge for a number of reasons. High inflation continues to devalue currency and be a increase price. A sharp increase in demand and manufacturing have also impacted price, following covid initiated supply disruptions. Weather and inaccurate forecasting also impacted those price hikes.
#The biggest industries that use similar commodities as Swire are food, fuel and clothing. A disruption in demand from any one of these will impact commodity prices for Swire. For instance, the fuel industry uses a lot of corn and sugar in their production. This industry is particularly vulnerable to economic events leading to difficulty in price forecasting.
# As I explored the data I realized more and more that there is a great deal of freedom in it and always something additional to look at. I have further questions and plots I'd like to create for this and will continue to add to this during the semester as I keep working on this project.
#At this point, no ethical concerns were uncovered.
Model Results and
Business Problem
# Having accurate predictions of price will help Swire with their business strategies. Knowing the direction price is likely to take, higher or lower, will inform Swire of when to purchase and how much volume to purchase. The ideal being to purchase commodities at the lowest price to maximize margins. The models have proven accurate enough to give direction information several months in advance. The price range within the forecasts looking forward a few months is relatively small. Considering the amount of information these models can give to purchase planning, they provide a great deal of help in solving the business problem.